As robots become a mass consumer product, they will need to learn new skills by interacting with typical human users. However, the design of machines that learn by interacting with ordinary people is a relatively neglected topic in machine learning. To address this, we advocate a systems approach that integrates machine learning into a Human-Robot Interaction (HRI) framework.

Our first goal is to understand the nature of the teacher's input to adequately support how people want to teach.
Our second goal is to then incorporate these insights into standard machine learning frameworks to improve a robot's learning performance.

To contribute to each of these goals, we use a computer game framework to log and analyze interactive training sessions that human teachers have with a Reinforcement Learning (RL) agent -- called Sophie. Although RL was originally formuated for unsupervised learning, we study it because of its popularity as a technique for teaching robots and game characters new skills by giving the human access to the agent's reward signal. However, we question the implicit assumption that people shall only want to give the learner feedback on its past actions.

To explore this topic, we carried out two user studies.

First User Study
In the initial user study, people trained Sophie to perform a novel task within a reinforcement-based learning framework. Analysis of the data yields several important lessons for how humans approach the task of teaching a RL agent:

1. they want the ability to direct the agent's attention;
2. they communicate both instrumental and motivational intentions;
3. they beneficially tailor their instruction for the agent in accordance to how it expresses its internal state; and
4. they use negative communication as both feedback for the previous action and as a suggestion for the next action (i.e., "do over").

Second User Study
Given these findings, we made specific modifications to Sophie and to the game interface to improve the teaching/learning interaction. Our modifications included:

1. an embellished channel of communication that distinguishes between guidance, feedback, and motivational intents;
2. endowing Sophie with transparency behaviors that reveal specific aspects of its learning process;
3. and providing Sophie with a more natural reaction to negative feedback.

A second set of user studies show that these empirically informed modifications result in several learning improvements across several dimensions including the speed of task learning, the efficiency of state exploration, the understandability of the agent's learning process for the human, and a significant drop in the number of failed trials encountered during learning (which makes the agent's exploration appear more sensible to the human).

HRI meets Machine Learning
This work demonstrates the importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance.

We present these user studies, lessons learned, and subsequent improvements to the learning agent and its game interface as empirical results to better inform and ground the design of teachable agents -- such as personal robots or interactive game characters. We believe such lessons and modifications can be made to the general class of reinforcement-based learning agents, and are not specific to the particular algorithm or character used in these studies. In doing so, we wish to broadly contribute to the creation of fun and engaging teachable robots (physical or virtual) that learn in real-time and in-situ from humans.
blocks_image
Papers
A.L. Thomaz & C. Breazeal (in press). “Understanding Human Teaching Behavior to Build More Effective Robot Learners.” Artificial Intelligence Journal (AIJ).

A.L. Thomaz, G. Hoffman, and C. Breazeal (2007). "Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent." In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (Ro-Man-07). Jeju Island , Korea.

A.L. Thomaz, and C. Breazeal (2006). "Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance." In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06).

Thomaz, A., & Breazeal, C. (2006) “Teachable Characters: User Studies, Design Principles, and Learning Performance.” In Proceedings of the 6th International Conference of Intelligent Virtual Agants (IVA-06). Marina Del Rey, CA. 395-406.

A. L. Thomaz, and C. Breazeal (2006). "Transparency and Socially Guided Machine Learning." In Proceedings of the 5th International Conference on Developmental Learning (ICDL-06).

A.L. Thomaz, G. Hoffman, and C. Breazeal (2006). "Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots." In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (Ro-Man-06). 352-357.

Thomaz, A. L., Hoffman, G., and Breazeal, C. (2006). Experiments in Socially Guided Machine Learning: Understanding How Humans Teach.” In Proceeding of the 1st ACM SIGCHI/SIGART Conference on Human-Robot interaction (Salt Lake City, Utah, USA, March 02 - 03, 2006). HRI '06. ACM Press, Neww York, NY, 359-360. Best Poster/Short Paper.

A.L. Thomaz and C. Breazeal.(2005) "Socially Guided Machine Learning: Designing an Algorithm to Learn from Real-Time Human Interaction." In NIPS 2005 Workshop on Robot Learning in Unstructured Environments.