Social robots as learning companions

Social robots are innovative new technologies that have great potential to support children’s education as tutors and learning companions. Thus, it behooves us to study the mechanisms by which children learn from social robots, as well as the similarities and differences between children’s learning from robots as compared to human partners.

When learning from human partners, infants and young children will pay attention to nonverbal signals, such as gaze and bodily orientation, to figure out what a person is looking at and why. They may follow gaze to determine what object or event triggered another’s emotion, or to learn about the goal of another’s ongoing action. They also follow gaze in language learning, using the speaker’s gaze to figure out what new objects are being referred to or named.

In the present study, we examined whether young children will attend to the same social cues from a robot as from a human partner during a word learning task, specifically gaze and bodily orientation.

We compared preschoolers’ learning of new words from a human and from a socially dynamic robot that behaved like a human when naming objects of interest: it turned toward, and selectively gazed at, the object being named.

36 children aged 2-5 years were presented with images of two unfamiliar animals and the interlocutor (human or robot) named one of the two animals. To identify which of the two animals was the intended referent, children needed to monitor the interlocutor’s non-verbal cues, specifically, gaze direction and bodily orientation. This task closely mirrors everyday language learning.

To assess the discriminability of the cues needed for selective learning, the images of the two animals were presented either close together, so that the interlocutor’s gaze direction and bodily orientation were similar regardless of which animal was being attended to and named, or further apart, so that the distinctiveness of the interlocutor’s gaze direction and bodily orientation with respect to each animal was more evident.

When images were presented close together, children subsequently identified the correct animals at chance level, whether they had been named by the human or by the robot.

By contrast, when the two images were presented further apart, children identified the correct animals at better than chance level from both interlocutors. Thus, children learned equally well from the robot and the human but in each case learning was constrained by the distinctiveness of non-verbal cues to reference.


The following people collaborated with us on this project:

Paul Harris – Graduate School of Education, Harvard University
David DeSteno – Dept. of Psychology, Northeastern University
Leah Dickens – Dept. of Psychology, Northeastern University