Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence Laboratory Current Research: Joint Reference.

Download Report

Transcript Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence Laboratory Current Research: Joint Reference.

Natural Tasking of Robots
Based on Human Interaction Cues
Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic
MIT Artificial Intelligence Laboratory
Current Research:
Joint Reference and Simple Mimicry
Goals
Our team at the MIT Artificial Intelligence lab is
building robotic systems that use natural social
conventions as an interface. We believe that
these systems will enable anyone to teach the
robot to perform simple tasks. The robot will be
usable without special training or programming
skills, and will be able to act in unique and
dynamic situations.
We originally outlined a sequence of behavioral
tasks, listed on the chart below, that will allow
our robots to learn new tasks from a human
instructor. In the chart below, behaviors in bold
text have been completed, behaviors in italic text
have been partially implemented.
Speech
Prosody
Development
of Social
Interaction
Face
Finding
Vocal Cue
Production
Eye
Contact
Directing Instructor’s
Attention
Gaze
Intentionality
Following
Detector
Gaze
Direction
Facial Expression
Recognition
Motion
Detector
Familiar Face
Recognition
Object
Saliency
Object
Segmentation
Attention
System
Development
of Sequencing
Smooth Pursuit
and Vergence
Kinesthetic Body
Representation
Line-of-Sight
Reaching
Task-Based
Guided Perception
Schema
Creation
Turn Taking
Development
of
Coordinated
Body Actions
Object
Permanence
Body Part
Segmentation
Depth
Perception
VOR/
OKR
Recognizing
Instructor’s
Knowledge
States
Arm and Face
Gesture Recognition
Recognizing
Pointing
Development
of
Commonsense
Knowledge
Robot
Teaching
Simple
Grasping
Self-Motion
Models
Reaching Around
Obstacles
Expectation-Based
Representations
Human Motion
Models
Long-Term Knowledge
Consolidation
Action
Sequencing
Social Script
Sequencing
Multi-Axis
Orientation
Recognizing
Beliefs, Desires,
and Intentions
Instructional
Sequencing
Mapping Robot
Body to Human Body
Our current research focuses on building the
perceptual and motor primitives that will
allow the robot to detect and respond to
natural social cues. In the past year, we have
developed systems that respond to human
attention states and that mimic the movement
of any animate object by tracing a similar
trajectory with the robot’s arm.
Animate Objects
The system operates in a sequence of stages:
• Visual input is filtered pre-attentively.
Face/Eye
Arm
• An attention mechanism selects salient
ToBY
Finder
Primitives
targets in each image frame.
• Targets are linked together into trajectories
Trajectory
by a motion correspondence procedure.
Formation
Gaze
Direction
• The “theory of body” module (ToBY) looks
Reaching /
Visual
Pointing
for objects that are self-propelled (animate).
Attention
• Faces are located in animate stimuli.
• Features such as the eyes and mouth are
Pre-attentive
f
f
f
f
filters
extracted to provide head orientation.
• Animate visual trajectories are mapped to
Visual Input
arm movements.
Skin
Saturation
w
w
Motion
Habituation
w
w
Tool Use
Object
Manipulation
Active Object
Exploration
Future Research

Attention Activation
More Complex Mimicry
One future direction for our work is to
look at more complex forms of social
learning. We will both explore a wider
range of tasks and ways to sequence
together learned actions into more
complex behaviors, and we will work
on building systems that imitate, that
is, they follow the intent of the action,
not the form of the action.
Understanding Self
We will also exploring ideas about
how to build representations of the
robot’s own body, and the actions
that it is capable of performing.
The robot should recognize it’s
own arm as it moves through the
world, and even be able to
recognize it’s own movements in a
mirror by the temporal correlation.
New Head and Hands
New Hands
Visual input is processed by a set of
parallel pre-attentive filters including
skin tone, color saturation, motion,
and disparity filters. The attention
system combines the filtered images
using weights that are influenced by
high-level task constraints. The
attention system also incorporates a
habituation mechanism and biases
the robot’s attention based on the
attention of the instructor.
The attention system produces a set of target
points for each frame in the image sequence.
These points are connected across time by the
multi-hypothesis tracking algorithm developed
by Cox and Hingorani. The system maintains
multiple hypothesis for each possible trajectory,
which allows for ambiguous data to be resolved
by further information.
Delay
Management
(pruning, merging)
Generate
Predictions
Generate k-best
Hypotheses
Matching
Feature
Extraction
The “theory of body” module
(ToBY) is a set of agents, each of
which incorporates a rule of
naïve physics. These rules
estimate how objects move under
Moving hand
Rolling chair
“Animate” chair
natural conditions. In the images
shown above, trajectories that obey these rules are judged to be inanimate (shown
in red), while those that display self-propelled movement (like the moving hand
or the “animate” chair being pushed with a rod) are judged animate (green).
The attention of the
instructor is monitored
by a system that finds
faces (using a color filter
and shape metrics),
orients to the instructor,
and extracts salient
features at a distance of
20 feet.
Locate
target
Foveate
Target
300 msec
Apply Face
Filter
Software
Zoom
Feature
Extraction
66 msec
Trajectories are selected based on the inherent
object saliency, the instructor’s attentional
state, and the animacy judgment. These
trajectories are mapped from visual
coordinates to a set of primitive arm postures.
The trajectory can then be used to allow the
robot to perform object-centered actions (such
as pointing) or process-centered actions (such
as repeating the trajectory with its own arm).