Social Learning Overview Rather than requiring people to learn a new form of communication to interact with robots or to teach them, our research concerns developing robots that can learn from natural human interaction in human environments.
We are exploring multiple forms of social learning, as well as empirically investigating how people teach robots. Sometimes we leverage on-line game characters to study how many people interact with our learning systems -- more than we could bring into our lab (see Sophie and MDS).
In contrast to many statistical learning approaches that require hundreds or thousands of trials or labeled examples to train the system, our goal is for robots to quickly learn new skills and tasks from natural human instruction and few demonstrations (see Learning by Tutelage). We have found that this process is best modeled as a collaboration between teacher and learner where the teacher guides the robot's exploration, and the robot provides feedback to shape this guidance. This has proven to accelerate the robot's learning process and improve its generalization ability.
Our research seeks to identify simple, non-verbal cues that human teachers naturally provide that are useful for directing the attention of robot learners. The structure of social behavior and interaction engenders what we term "social filters:" dynamic, embodied cues through which the teacher can guide the behavior of the robot by emphasizing and de-emphasizing objects in the environment. In particular, we argue that visual perspective, action timing, and spatial scaffolding, in which teachers use their bodies to spatially structure the learning environment to direct the attention of the learner, are highly valuable cues for robotic learning systems.
The robot's learning should also be self-motivated. Rather than only learning when a person is around to teach it, the robot should seek out new learning experiences to learn new skills and concepts, as well as strive to master familiar ones through practice (see Learning by Guided Exploration).
Once a task is learned, the robot should then be competent in its ability to provide assistance; understanding how to perform the task as well as how to perform it in partnership with a human (see Teamwork).
The robot should be able to learn social skills that enable it to learn in new ways. One example is the ability to map its body on to that of the human teacher (see Learning to Mimic Faces and Bodies). This skill enables the robot to learn via demonstration and imitation -- two powerful forms of social learning found in nature. Imitation is important not only for learning new skills, but to boostrap its social understanding of others (see Learning by Imitation and Social Cognition).
Further these social cognitive skills (e.g., shared attention, mimicry, etc.) interact and build on each other over time, to allow new forms of social learning. Learning by Social Referencing is one such example whereby Leonardo learns how to appraise novel objects by observing the emotional response of how another person reacts to it.
Learning by Socially Guided Exploration Personal robots must be able to learn new skills and tasks while on the job from ordinary people. How can we design robots that learn effectively and opportunistically on their own, but are also receptive of human guidance --- both to customize what the robot learns, and to improve how the robot learns?
Human-teachable robots must be able to move flexibly and opportunistically along the GUIDANCE/EXPLORATINON spectrum. Along the GUIDANCE dimension, many prior works require a constant (often high) involvement of a human teacher for the robot to learn anything --- e.g., learning by demonstration or tutelage. In these examples, the robot's exploration is strongly determined by the human. Often, the human must learn how to interact with the robot, and in some cases “teaching” the robot is effectively programming it using speech or gesture through specific protocols. In contrast, along the EXPLORATION dimension, the robot largely explores on its own, and the human’s involvement (if any) is highly constrained --- usually requiring the human to learn how to correctly interact with the robot. Examples include granting the human control over the reward given to a reinforcement learner (as in clicker training), allowing the human to provide domain specific advice, or allowing the human to teleoperate the robot during training.
Our approach, Guided Exploration, captures two important characteristics of robots that learn in human environments. First, the ability to explore on its own to discover new goals and learn generalized tasks to achieve them. Second, the ability to leverage a human partner to improve what and how the robot learns through a collaborative process.
We address the first characteristic by endowing the robot with learning-specific motivations that drive its exploration-exploitation process to create learning opportunities for a Reinforcement Learning with Options mechanism. Our approach to self-motivated learning incorporates complementary motivations of Novelty and Mastery that interact to drive the learning process. This is inspired by natural learning systems that are continually driven to learn new things and master them through practice. Second, we have developed a novel generalization mechanism to enable the robot to refine the context and goal of learned activities over time in order to converge on a generalized representation for each goal-oriented task. We address the second characteristic by implementing social scaffolding mechanisms and communication transparency devices to naturally support human guidance.
Our early experiments show that intrinsic measures along with extrinsic support define new goals for the robot, and hierarchical action policies can then be learned in a standard way (reinforcement learning with options) for reaching these goals. Thus, the robot can identify its own learning opportunities and act to pursue them by evoking relevant learning behaviors. Not surprisingly, as with human children, the robot can explore more efficiently, discover important events sooner, master them quicker, and generalize better with a human teacher present.
We identified several significant contributions a human teacher can readily offer the robot's learning process. First, by suggesting actions to try, the human helps the robot to quickly discover how to bring about novel events. In response, the robot creates new goals that can be mastered through subsequent experimentation. Second, by explicitly labeling these events, the human makes sure the robot does not miss the event as a learning opportunity. Furthermore, by labeling important states, the human gives them a “verbal handle” by which he or she can refer to them at a later time to scaffold more complex skills that build upon this knowledge. Third, the person can play an important role in helping the robot to learn generalized task representations by incrementally structuring the environment and helping the robot to link what it already knows to new contexts.
Learning by Tutelage Learning by human tutelage leverages from structure provided through interpersonal interaction. For instance, teachers direct the learners' attention, structures their experiences, supports their learning attempts, and regulates the complexity and difficulty of information for them. The teacher maintains a mental model of the learner's state (e.g. what is understood so far, what remains confusing or unknown) in order to appropriately structure the learning task with timely feedback and guidance. Meanwhile, the learner aids the instructor by expressing his or her current understanding through demonstration and using a rich variety of communicative acts such as facial expressions, gestures, shared attention, and dialog.
We want to understand how social guidance enables a learner to acquire new concepts, task models, and skills from few examples. Our approach models tutelage as a fundamentally collaborative process that takes place within a tightly coupled interaction between teacher and learner. We have developed an integrated learning/behavior model and implemented it on an anthropomorphic robot, Leonardo, to test our hypotheses for how interpersonal interaction allows for rapid and robust learning of new goals and ways to achieve them. This allows us to conduct situated and embodied experiments where people can teach the robot in various tutorial scenarios.
In this movie we are studying how social guidance---e.g. sharing attention, providing feedback, structuring experience, and regulating the complexity of information---interplays with traditional inference algorithms (such as Bayesian hypothesis testing) in an interactive learning scenario. The human instructor teaches the robot a button-activation task using dialog, gesture, and gaze. The robot communicates its current understanding through demonstration and expressive social cues. Over only a few trials, the robot learns how to generalize the concept of "all buttons on" (i.e., all buttons have their LEDs on) and how to apply this concept on a new configuration of buttons.
The movie shows that the robot can successfully generalizes the goal and can achieve it with a new configuration of buttons. Also, the robot demonstrates committment and understanding of the overall task goal --- in the final clip, when the human undoes part of the task, Leonardo acknowledges this and finishes the task successfullly.
Learning by Imitation This work explores how imitation as a social learning and teaching process may be applied to building socially intelligent robots.
Perceiving similarities between self and other is an important part of the ability to take the role or perspective of another, allowing people to relate to and to empathize with their social partners. This sort of perspective shift may help us to predict and explain other's emotions, behaviors and other mental states, and to formulate appropriate responses based on this social understanding.
Simulation Theory (ST) is one of the dominant hypotheses about the nature of the cognitive mechanisms that underlie the social understanding of others (Davies and Stone 1995; Gordon 1986; Heal 2003). It posits that by simulating another person's actions and the stimuli they are experiencing using our own behavioral and stimulus processing mechanisms, humans can make predictions about the behaviors and mental states of others based on the mental states and behaviors that we would possess in their situation.
Meltzoff proposes that the way in which infants learn to simulate others is through imitative interactions. He hypothesizes that the human infant's ability to translate the perception of another's action into the production of their own action provides a basis for learning about self-other similarities, and for learning the connection between behaviors and the mental states producing them. For instance, a developmental milestone for 18 month old infants to imitate the intended goal of unsuccessful actions.
Our approach is guided by the hypothesis that imitative interactions between infant and caregiver, starting with facial mimicry, are a significant stepping-stone to develop appropriate social behavior, to predict other's actions, and ultimately to understand people as social beings. We are investigating how Leonardo can bootstrap from its imitative ability to understand the actions of others in intentional terms. This is a critical skill for cooperative behavior such as teamwork.
Learning to Mimic Faces This work presents a biologically inspired implementation of early facial imitation based on the AIM model proposed by Meltzoff & Moore. Although there are competing theories to explain early facial imitation (such as an innate releasing mechanism model where fixed-action patterns are triggered by the demonstrator's behavior, or viewing it as a by-product of neonatal synesthesia where the infant confuses input from visual and proprioceptive modalities), Meltzoff presents a compelling account for the representational nature and goal-directedness of early facial imitation, and how this enables further social growth and understanding.
Much as infants' earliest social interactions involve imitating facial expressions, our first step towards creating a robot capable of social understanding is an implementation of facial mimicry. In order for a robot to imitate it must be able to translate between seeing and doing. Specifically, to solve the facial imitation task the robot must be able to:
Learning to Mimic Bodies This section describes the process of using Leo's perceptions of the human's movements to determine which motion from the robot's repertoire the human might be performing. The technique described here allows the joint angles of the human to be mapped to the geometry of the robot even if they have different morphologies, as long as the human has a consistent sense of how the mapping should be and is willing to go through a quick, imitation-inspired process to learn this body mapping. Once the perceived data is in the joint space of the robot, the robot tries to match the movement of the human to one of its own movements (or a weighted combination of prototype movements). Representing the human's movements as one of the robot's own movements is more useful for further inference using the goal-directed behavior system than a collection of joint angles.
Leonardo has the ability to physically imitate humans and to recognize and critique human movements. To imitate people, the robot needs to convert the 3D joint angle data it perceives about the human (wearing the motion capture suit or captured through an optical motion capture system) into its own 1D joint space (i.e., its actuators) in order to physically map the human's body onto its own. To analyze this movement, Leonardo must then convert this body mapping data into a representation that allows the robot to compare the observed human movement against prototype movements stored within the robot's movement repertoire. There may be a direct match between the observed movement an a prototype within the robot's repertoire, or the observed movement may be a weighted composition of several prototype movements within the robot's repertoire.
- Locate and recognize the facial features of a demonstrator - Find the correspondence between the perceived features and its own - Identify a desired expression from this correspondence - Move its features into the desired configuration - Use the perceived configuration to judge its own success
Meltzoff and Moore (1997) proposed a descriptive model for how an infant might accomplish these tasks, known as the Active Intermodal Mapping Hypothesis (AIM). In general, the AIM model suggests that a combination of innate knowledge and specialized learning mechanisms underlie infants' ability to imitate in a cross-modal, goal-directed manner. Specifically, AIM presents three key components of the imitative process as discussed in the previous section: motor babbling, organ identification, and the intermodal space. Taken together, this model suggests mechanisms for identifying and attending to key perceptual features of faces, mapping the model's face onto the imitator's, generating appropriate movements, and gauging the correspondence between produced and perceived expressions. We have used this model to guide our own implementation (with allowances made for the differing physical limitations of babies and robots).
The video explains the implementation.
To learn a mapping between the human's body and Leonardo's body, an example set consisting of pairs of tracked human body poses matched to robot poses is needed. To acquire this data, the robot engages the human in an intuitive "do as I do" imitation game, inspired by early facial imitation whereby human infants learn how to imitate the facial expressions of their caregivers. In this scenario, the robot first takes the lead and moves through a series of poses as the human imitates. Because the human is imitating the robot, there is no need to manually label the tracking data -- the robot is able to self-label them according to its own pose. Because the structure of the interaction is punctuated rather than continuous, the robot is also able to eliminate noisy examples as it goes along, requiring fewer total examples to learn the mapping. This process is shown in the video.
We have found that it is often difficult for people to imitate the entire full-body pose of the robot at once. Rather, when imitating, people tend to focus on the aspect of the movement that they consider to be most relevant to the pose, such as a single limb. Furthermore, humans and Leonardo both exhibit a large degree of left-right symmetry. It does not make sense from a practical standpoint to force the human to perform identical game actions for each side of the robot. We therefore divide the robot's learning module into a collection of multiple learning "zones" that are known to exhibit symmetry, and can share appropriately transformed base mapping data in order to accelerate the learning process.
Once the robot finishes acquiring its training set, it applies a radial basis function mapping technique to train the inter-body mapping. We have found that the RBF does a good job in capturing interdependencies between the joints, and provides a very faithful mapping in the vicinity of the example poses. However, its performance decays as the data moves away from the training examples. As a result, the robot must use a carefully selected set of training poses to sufficiently cover the movement space. We found that this technique was also sensitive to errors during the training process, particularly in the case of closely neighboring example poses, which could sometimes cause significant warping of the map.
When the robot relinquishes the lead to the human, the robot then tries to imitate the human's pose. The human can then take a reinforcement learning approach, verifying whether the robot has learned a good mapping or whether a further iteration of the game is required. One advantage of this entire process is that it takes place within an intuitive interactive game between the human and robot (instead of forcing the robot offline for a manual calibration procedure which breaks the illusion of life) and requires no special training or background knowledge on the part of the human teacher.
Learning by Perspective Taking The ability to interpret demonstrations from the perspective of the teacher plays a critical role in human learning. This work addresses an important issue in building robots that can successfully learn from demonstrations that are provided by everyday people who do not have expertise in the learning algorithms used by the robot. As a result, the teacher may provide sensible demonstrations from a human's perspective; however, these same demonstrations may be insufficient, incomplete, ambiguous, or otherwise “flawed” in terms of providing a correct and sufficiently complete training set in order for the learning algorithm to generalize properly.
To address this issue, we believe that socially situated robots will need to be designed as socially cognitive learners that can infer the intention of the human's instruction, even if the teacher's demonstrations are less than perfect for the robot.
Our approach to endowing machines with socially-cognitive learning abilities is inspired by leading psychological theories and recent neuroscientific evidence for how human brains might infer the mental states of others. Specifically, Simulation Theory holds that certain parts of the brain have dual use; they are used to not only generate behavior and mental states, but also to predict and infer the same in others.
We have developed an integrated architecture (perspective taking) wherein the robot’s cognitive functionality is organized around the ability to understand the environment from the perspective of a social partner as well as its own.
We have evaluated the performance of this architecture against human learning performance in a novel study examining the importance of perspective taking in human learning. Perspective taking, both in humans and in our architecture, focuses the agent's attention on the subset of the problem space that is important to the teacher. This constrained attention allows the agent to overcome ambiguity and incompleteness that can often be present in human demonstrations and thus learn what the teacher intends to teach.
The video shows Virtual Leonardo learning the correct concept of “fill all the blocks” despite a visually occluded blue block. If Virtual Leonardo were not able to reason from the perspective of the human teacher, it would learn a task concept of “fill all but blue blocks”, for instance. However, because Virtual Leonardo is aware that the teacher cannot see the occluded block, it understands that it should discard the occluded block as part of the intended demonstration.
Learning by Social Referencing Social referencing is the tendency to use the emotional reaction of another to help form one's own affective appraisal of a novel situation, which is then used to guide subsequent behavior. It is an important form of emotional communication and is a developmental milestone for human infants in their ability to learn about their environment through social means. We have implemented a biologically-inspired computational model of social referencing for Leonardo. Our model consists of three interacting systems:
- an emotional empathy through facial imitation, - a shared attention mechanism, - and an affective memory system.
These systems interact to enable the robot to demonstrate social referencing behavior similar to that of human infants. We argue that in addition to forming a basis for social learning in robots, our model presents opportunities for understanding how these mechanisms might interact to enable social referencing behavior in humans.
Social referencing represents a new channel of emotional communication between humans and robots, one in which the human plays a central role in shaping and guiding the robot's understanding of the objects in its environment. We contend that this work has important implications for designing robots that are able to acquire their own metrics of success to guide their own subsequent learning and behavior, rather than have these success metrics hardwired into the robot by a human machine learning specialist. In our approach, the human partner can shape these metrics of success in real-time through natural social interaction.
Papers M. Berlin, J. Gray, A. L. Thomaz, and C. Breazeal (2006). “Perspective Taking: An Organizing Principle for Learning in Human-Robot Interaction.” In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06).
C. Breazeal, M. Berlin, A. Brooks, J. Gray, and A. L. Thomaz (2006). “Using Perspective Taking to Learn from Ambiguous Demonstrations.” Robotics and Autonomous Systems (RAS) Special Issue on The Social Mechanisms of Robot Programming by Demonstration, 54(5), 385-393.
Papers C. Breazeal, A. Brooks, J. Gray, G. Hoffman, C. Kidd, H. Lee, J. Lieberman, A. Lockerd, and D. Chilongo (2004). “Tutelage and Collaboration for Humanoid Robots.” International Journal of Humanoid Robots, 1(2), 315—348.
C. Breazeal, G. Hoffman and A. Lockerd (2004). “Teaching and Working with Robots as a Collaboration.” In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS). 1030-1037.
Papers A. L. Thomaz, and C. Breazeal (2007). "Robot Learning via Socially Guided Exploration." In Proceedings of the 6th International Conference on Developmental Learning (ICDL-07). Imperial College, London.
C. Breazeal & A. Thomaz (2008). “Learning from human teachers with socially guided exploration. Proceedings of the 2008 IEEE International Conference on Robotics and Automatio (ICRA-08). Pasadena, CA.
C. Breazeal and A. Thomaz (2008) Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, 20(2-3), 91-100.
Papers C. Breazeal, A. Brooks, J. Gray, G. Hoffman, C. Kidd, H. Lee, J. Lieberman, A. Lockerd, and D. Chilongo (2004). "Tutelage and Collaboration for Humaoid Robots," International Journal of Humanoid Robots, 1(2), 315—348.
C. Breazeal, G. Hoffman and A. Lockerd (2004). "Teaching and Working with Robots as a Collaboration." In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS'04), 1028-1035, New York, NY.A. Lockerd and C. Breazeal (2004). "Tutelage and Socially Guided Robot Learning." Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS04), Sendai, Japan.
Papers C. Breazeal, D. Buchsbaum, J. Gray, D. Gatenby, B. Blumberg (2005). "Learning from and about Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots," L. Rocha and F. Almedia e Costa (eds.), Artificial Life 11(1-2).
Papers C. Breazeal, D. Buchsbaum, J. Gray, D. Gatenby, B. Blumberg (2005). "Learning from and about Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots," L. Rocha and F. Almedia e Costa (eds.), Artificial Life 11(1-2).
Papers Brooks, J. Gray, G. Hoffman, A. Lockerd, H. Lee and C. Breazeal (2004). "Robot's Play: Interactive Games with Sociable Machines,” ACM Computers in Entertainment, 2(3), 1-18. Reprinted from Proceedings of the 2004 ACM SIGCHI international Conference on Advances in Computer Entertainment Technology (Singapore, June 03 - 05, 2005). ACE '04, vol. 74. ACM Press, New York, NY, 74-83. Outstanding Paper Commendation.
Papers Thomaz, A. L., Berlin, M. and Breazeal, C. (2005) "Robot Science Meets Social Science: An Embodied Computational Model of Social Referencing." Cog Sci 20005 Workshop Toward Social Mechanisms Android Science. Trento, Italy
Thomaz, A. L., Berlin, M. and Breazeal, C. (2005), "An Embodied Computational Model of Social Referencing." Proceedings of Fourteenth IEEE Workshop on Robot and Human Interactive Communication (Ro-Man05), Nashville, TN
Learning by Spatial Scaffolding Spatial scaffolding is a naturally occurring human teaching behavior, in which teachers use their bodies to spatially structure the learning environment to direct the attention of the learner. Robotic systems can take advantage of simple, highly reliable spatial scaffolding cues to learn from human teachers.
We have developed an integrated robotic architecture that combines social attention and machine learning components to learn tasks effectively from natural spatial scaffolding interactions with human teachers. We evaluated the performance of this architecture in comparison to human learning data drawn from a novel study of the use of embodied cues in human task learning and teaching behavior. This evaluation provided quantitative evidence for the utility of spatial scaffolding to learning systems.
Our evaluation also supported the construction of a novel, interactive demonstration of our humanoid robot, Leonardo, taking advantage of spatial scaffolding cues to learn from natural human teaching behavior.
The video shows off an interaction sequence between the robot and a human teacher. A mixed-reality workspace was created so that the robot and the human teacher could both interact gesturally with animated foam blocks on a virtual tabletop. The robot, instructed to build a sailboat figure, starts to construct the figure as the teacher watches. The teacher's goal is to guide the robot into using only blue and red blocks to construct the figure.
As the interaction proceeds, the robot tries to add a green rectangle to the figure. The teacher interrupts, pulling the block away from the robot. As the robot continues to build the figure, the teacher tries to help by sliding a blue block and a red block close to the robot's side of the screen. The teacher then watches as the robot completes the figure successfully. To demonstrate that the robot has indeed learned the constraints, the teacher walks away, and instructs the robot to build a new figure. Without any intervention from the teacher, the robot successfully constructs the figure, a smiley-face, using only red and blue blocks. Thus the robot has taken advantage of the teacher's nonverbal spatial cues to successfully learn the secret constraint and complete the task.