Sophie’s Kitchen

    Sophie’s Kitchen is an interactive video game that takes place in a virtual house. The goal is to help the virtual agent (Sophie) bake a cake, by providing her with guidance through reward. Users can connect online to play with Sophie in a training session. Sophie can turn left and right, pick up, put down, and use items in the kitchen. A slider on the left allows the human to send “rewards” (a scalar from [-1, 1]) to Sophie. Traditional Reinforcement Learning algorithms make certain assumptions about the nature and meaning of the reward signal…but do those assumptions match up to reality when the reward comes from a human?

    18 participants came to the lab to play with Sophie. Participants were told that they could not directly communicate with Sophie, but could instead give Sophie periodic feedback about how she was doing. All but one participant managed to successfully teach Sophie the task, but the specific strategies they used greatly influenced how long it took. Major insights into how humans tried to teach Sophie can be found in the next section.

    Unlike traditional RL reward functions, we found that humans seem to generate reward as both feedback (i.e. for past actions) and guidance (to encourage future actions that seem likely to happen). Humans also exhibit positive bias in rewards – aggregated across all situations, they are more likely to give positive, rather than negative, feedback to the agent. Finally, we found that as humans refine their mental models of the agent, their reward strategy shifts accordingly – thus, to elicit the best possible training from the human, an agent should make its mental model transparent (e.g., by using gaze behavior to indicate future intended actions). Overall, we were able to show that designing an agent that takes advantage of these human predispositions is able to learn more quickly and effectively. Sophie’s Kitchen set the stage for future research into Socially Guided Machine Learning, including investigations of multiple channels for specialized feedback.

    Sophie’s Kitchen: systematically studying how people teach machines in order to build more effective learning robots