Leonardo has 69 degrees of freedom — 32 of those are in the face alone. As a result, Leonardo is capable of near-human facial expression (constrained by its creature-like appearance). Although highly articulated, Leonardo is not designed to walk. Instead, its degrees of freedom were selected for their expressive and communicative functions. It can gesture and is able to manipulate objects in simple ways. Standing at about 2.5 feet tall, it is the most complex robot the studio has attempted (as of Fall 2001). Leonardo is the most expressive robot in the world today.
Unlike the vast majority of autonomous robots today, Leonardo has an organic appearance. It is a fanciful creature, clearly not trying to mimic any living creature today. This follows from our philosophy that robots are not and will never be dogs, cats, humans, etc. so there is no need to make them look as such. Rather, robots will be their own kind of creature and should be accepted, measured, and valued on those terms. We gave Leonardo a youthful appearance to encourage people to playfully interact with it much as one might with a young child.
Embedded Multi-Axis Motion Controller
Exploring human-robot interaction requires constructing increasingly versatile and sophisticated robots. Commercial motor-driver and motion-controller packages are designed with a completely different application in mind (specifically industrial robots with relatively small numbers of relatively powerful motors) and do not adapt well to complex interactive robots with a very large number of small motors controlling things like facial features. Leonardo, for instance, includes sixty-some motors in an extremely small volume. An enormous rack of industrial motion controllers would not be a practical means of controlling the robot; an embedded solution designed for this sort of application is required.
We have developed a motor control system to address the specific needs of many-axis interactive robots. It is based on a modular colletion of motor control hardware which is capable of driving a very large number of motors in a very small volume. Both 8-axis and 16-axis control packages have been developed.
These controllers support simultaneous absolute position and velocity feedback, allowing good dynamic performance without the need for lenghty calibration phase at power-up. Example firmware has been developed which supports accurate position estimation and PD control to a continuously-updated target position. The control system is highly flexible, allowing alternative control algorithms to be developed with ease.
A generic software library has also been developed to provide a clean interface betwen high-level control code and low-level motor hardware, as has a generic network protocol, known as the Intral-Robot Communications Protocol, which provides a simple and extensible framework for inter-module communication within a complex robot control system.
For instance, 4 of the 16-axis motor controller packages are used to control Leonardo. A single 8-axis package is used to control RoCo.
The motor drivers are standard FET H-bridges; recent advances in FET process technology permit surprisingly low RDS on losses, and switching at relatively low (1-10kHz) frequencies reduces switching losses. Hence, the power silicon (and thus the package as a whole) can be reduced in size. The audible hum and interference due to the low switching frequency (which is completely unacceptable for an organic looking robot) is eliminated by using a variable-mean spread-spectrum control signal, rather than traditional PWM. The sixteen channels each support current feedback, encoder feedback, and analog feedback, and the system is controlled by a custom SoC motion controller with an embedded soft processor core implemented in a Xilinx Virtex FPGA.
We have developed a real-time face recognition system for Leonardo that can be trained on the fly via a simple social interaction with the robot. The interaction allows people to introduce themselves and others to Leonardo, who tries to memorize their faces for use in subsequent interactions. Our face recognition technology is based on the appearance manifold approach described in Nayar, Nene, and Murase, “Real-Time 100 Object Recognition System,” 1996.
The system receives images from the camera mounted in Leo’s right eye. Faces are isolated from these images using data provided by a facial feature tracker developed by the Neven Vision corporation. Isolated face images are resampled into small (25 x 25 pixel) greyscale images and projected onto the first 40 principal components of the face image data set. These face image projections are matched against appearance manifold splines to produce a classification, retrieving the name associated with the given face.
In order to learn new faces, Leo keeps a buffer of up to 200 temporally-contiguous or near-contiguous views of the currently tracked face. This buffer is used to create a new face model whenever the person introduces themselves via speech. When a new model is created, principal component analysis (PCA) is performed on the entire face image data set, and a spline manifold is fitted to the images of the new face. The appearance manifold splines for the other face models are also recomputed at this time. This full model building process takes about 15 seconds. Since the face recognition module runs as a separate process from Leo’s other cognitive modules, the addition of a new face model can be done without stalling the robot or the interaction.
The face recognition module receives speech input provided by the Sphinx-4 speech recognition system. The speech recognition system allows people to introduce themselves via simple phrases: “My name is Marc” or “Leo, this is Dan.” Speech input also allows us to test Leo’s recall: “Leo, can you find Andrea?”
The full system provides face recognition information at a rate of approximately 13 Hz, running on a dual-2GHz G5 macintosh.
The video below shows a single interaction wherein Leo is introduced to two new people for the first time. Leo learns both of their names and builds a model of each of their faces. Leo’s face recognition abilities are tested by asking him to find each of the people as they move to a few different locations in the scene. Leo scans the scene, and when he finds a face that matches the query, he points to it. When asked to find someone who is absent from the scene, Leo looks around for a while, then shrugs to indicate that he cannot find them.
A necessary sensory aptitude for a sociable robot is to know where people are and what they are doing. Hence, our sociable robot needs to be able to monitor humans in the environment and interpret their activities, such as gesture-based communication.
The robot must also understand aspects about the inanimate environment as well, such as how its toys behave as it plays with them. An important sensory modality for facilitating these kinds of observations is vision. The robot will need a collection of visual abilities, closely tied to the specific kind of information about the interactions that the robot needs to extract.
Towards this goal, we are developing a suite of visual capabilities as we investigate the use of Intel’s OpenCV library (supplementing the routines with the addition of Mac G4 AltiVec operations). This includes a collection of visual feature detectors for objects (e.g., color, shape, and motion) and people (e.g., skin tone, eye detection, and facial feature tracking), the ability to specify a target of attention and track it, and stereo depth estimation.
Active vision behaviors include the ability to saccade to the locus of attention, smooth pursuit of a moving object, establishing and maintaining eye contact, and vergence to objects of varying depth. The movie shows Leonardo tracking a red Elmo plush doll.
The human skin is the largest sensory organ of our body and of profound importance to how we interact with the world and with others. Yet despite its significance in living systems, the sense of touch is conspicuously rare if not absent in robots.
Giving the robot a sense of touch will be useful for detecting contact with objects, sensing unexpected collisions, as well as knowing when it is touching its own body. Other important tactile attributes relate to affective content—whether it is pleasure from a hug, a ticking gesture, or pain from someone grabbing the robot’s arm too hard, to name a few.
The goal of this project is to develop a synthetic skin capable of detecting temperature, proximity, and pressure with acceptable resolution over the entire body, while still retaining the look and feel of its organic counterpart. Toward this end, we are experimenting with layering silicone materials (such as those used for make-up effects in special effects industry) over force sensitive resistors (FSR), quantum tunneling composites (QTC), temperature sensors, and capacitive sensing technolgoies.
In addition to developing the sensate skin technology, we are developing a distributed computational infrastructure to quickly read in large number of sensors throughout the body.
We are also developing a computational somatosensory cortex with low level feature extractors and pattern recognition algorithms to implement these tactile perception abilities.
Working closely with the artists at Stan Winston Studio, we are developing a tactile sensing system where FSRs are placed over the robot’s core and under the silicone skin and fur.
Using the homunculus distribution of sensing resolution as a guide, we are varying the density of sensors so that the robot will have greater resolution in areas that are frequently in contact with objects or people.
A distributed network of tiny processing elements is also being developed to lie underneath the skin to acquire and process the sensory signals.
This movie shows Leonardo responding to touch. Capacitive sensing technology is used near the ear, and force resistive sensing is used in the hand.
Sensate Hand Design
We have currently created a new articulated hand for Leonardo consisting of integrated tactile sensor circuit boards on the palm, back, and side of the hand as well as a PIC based 64 Channel A/D converter board housed inside the hand. We are in the process of developing algorithms which will begin to treat clusters of FSRs as receptive fields for higher level processing. These “cortical-level” fields will be capable of processing motion, direction, and orientation as well determing the centroid of an object placed on the skin. This framework is being design using the hands as a test case but with the final design of a full-body sense of touch in mind.
This movie below shows Leonardo’s new hand design with FSR’s responding to pressure being applied by different objects.
W. D. Stiehl, L. Lalla, and C. Breazeal (2004). “Applying a Somatic Alphabet Approach to Inferring Orientation, Motion and Direction in Clusters of Force Sensing Resistors.”
- W. D. Stiehl, L. Lalla, and C. Breazeal (2004). “A Somatic Alphabet Approach to Sensitive Skin for Robots.”
- Stiehl, W.D & Breazeal, C. (2006) “A Sensitive Skin for Robotic Companions Featuring Temperature, Force and Electric Field Sensors.”
- Walter Daniel Stiehl, Sensitive Skins and Somatic Processing for affective and Sociable Robots based upon a Somatic Alphabet Approach, May 2005. S. M. Media Arts and Sciences.