Enhancing Reusability of Learned Skills for Robot Manipulation Via Gaze Information and Motion Bottlenecks
Ryo Takizawa, Izumi Karino, Koki Nakagawa, Yoshiyuki Ohmura, Yasuo Kuniyoshi
AI summary
Problem
Conventional deep imitation learning methods struggle to generalize learned manipulation skills to new object positions and initial end-effector poses, often requiring exhaustive demonstration collection or sacrificing control precision to improve generalization.
Approach
GazeBot leverages a gaze-centered 3D point cloud to create a position-robust visual representation and segments actions at a learned bottleneck pose, combining a goal-fixed reaching phase with a fully parametric, gaze-centered action policy.
Key results
- High success rates on out-of-distribution object positions and end-effector poses compared to state-of-the-art imitation learning
- Data-driven action segmentation using gaze-centered point cloud predictivity to identify motion bottlenecks
- Gaze-centered point cloud representation that maintains 3D structural consistency across varying object locations
- Fully parametric Transformer-based policy that preserves real-time dexterity and reactivity during manipulation
Why it matters
Enables robots to rapidly adapt and reuse learned skills in dynamic environments without costly retraining or exhaustive demonstration collection.
Abstract
Autonomous agents capable of diverse object ma- nipulations should be able to acquire a wide range of ma- nipulation skills with high reusability. Although advances in deep learning have made it increasingly feasible to replicate the dexterity of human teleoperation in robots, generalizing these acquired skills to previously unseen scenarios remains a significant challenge. In this study, we propose a novel algorithm, Gaze-based Bottleneck-aware Robot Manipulation (GazeBot), which enables high reusability of learned motions without sac- rificing dexterity or reactivity. By leveraging gaze information and motion bottlenecks—both crucial features for object ma- nipulation—GazeBot achieves high success rates compared with state-of-the-art imitation learning methods, particularly when the object positions and end-effector poses differ from those in the provided demonstrations. Furthermore, the training process of GazeBot is entirely data-driven once a demonstration dataset with gaze data is provided.