← Back ICRA 2026

Enhancing Reusability of Learned Skills for Robot Manipulation Via Gaze Information and Motion Bottlenecks

Ryo Takizawa, Izumi Karino, Koki Nakagawa, Yoshiyuki Ohmura, Yasuo Kuniyoshi

PDF

AI summary

Key figure (auto-extracted from paper)

GazeBot enables robots to reuse learned manipulation skills across unseen object positions and end-effector poses without sacrificing dexterity or reactivity.

Imitation Learning Skill Reusability Gaze-Based Manipulation Motion Bottlenecks Robot Control Point Cloud Processing

Problem

Conventional deep imitation learning methods struggle to generalize learned manipulation skills to new object positions and initial end-effector poses, often requiring exhaustive demonstration collection or sacrificing control precision to improve generalization.

Approach

GazeBot leverages a gaze-centered 3D point cloud to create a position-robust visual representation and segments actions at a learned bottleneck pose, combining a goal-fixed reaching phase with a fully parametric, gaze-centered action policy.

Key results

High success rates on out-of-distribution object positions and end-effector poses compared to state-of-the-art imitation learning
Data-driven action segmentation using gaze-centered point cloud predictivity to identify motion bottlenecks
Gaze-centered point cloud representation that maintains 3D structural consistency across varying object locations
Fully parametric Transformer-based policy that preserves real-time dexterity and reactivity during manipulation

Why it matters

Enables robots to rapidly adapt and reuse learned skills in dynamic environments without costly retraining or exhaustive demonstration collection.

Abstract

Autonomous agents capable of diverse object ma- nipulations should be able to acquire a wide range of ma- nipulation skills with high reusability. Although advances in deep learning have made it increasingly feasible to replicate the dexterity of human teleoperation in robots, generalizing these acquired skills to previously unseen scenarios remains a significant challenge. In this study, we propose a novel algorithm, Gaze-based Bottleneck-aware Robot Manipulation (GazeBot), which enables high reusability of learned motions without sac- rificing dexterity or reactivity. By leveraging gaze information and motion bottlenecks—both crucial features for object ma- nipulation—GazeBot achieves high success rates compared with state-of-the-art imitation learning methods, particularly when the object positions and end-effector poses differ from those in the provided demonstrations. Furthermore, the training process of GazeBot is entirely data-driven once a demonstration dataset with gaze data is provided.

Index terms

Imitation Learning Perception for Grasping and Manipulation Dual Arm Manipulation