Adaptor: Advancing Assistive Teleoperation with Few-Shot Learning and Cross-Operator Generalization
Yu Liu, Yihang Yin, Tianlv Huang, Fei Yan, Yuan Xu, Weinan Hong, Wei Han, Yue Cao, Xiangyu Chen, Zipei Fan, Xuan Song
AI summary
Problem
Inter-operator variability in teleoperation causes intent recognition instability and poor generalization, forcing systems to rely on fixed intent sets or costly per-operator retraining. Existing methods struggle to adapt to diverse user habits and skill levels without extensive data collection.
Approach
Adaptor models intent uncertainty by injecting stochastic noise into demonstration trajectories and extracting geometrically-aware keyframes, then conditions a pre-trained Vision-Language-Action policy to adapt to new operators with minimal data.
Key results
- Achieves state-of-the-art success rates and efficiency across six simulation and real-world tasks
- Demonstrates robust cross-operator generalization with low variance across users of varying expertise
- Effectively mitigates covariate shift through trajectory perturbation and keyframe extraction
- Surpasses pure and assisted teleoperation baselines in completion time and user satisfaction
Why it matters
Enables scalable, personalized assistive teleoperation for diverse users without costly per-operator retraining, advancing practical human-robot collaboration.
Abstract
Assistive teleoperation enhances efficiency via shared control, yet inter-operator variability, stemming from diverse habits and expertise, induces highly heterogeneous tra- jectory distributions that undermine intent recognition stability. We present Adaptor, a few-shot framework for robust cross- operator intent recognition. The Adaptor bridges the domain gap through two stages: (i) preprocessing, which models intent uncertainty by synthesizing trajectory perturbations via noise injection and performs geometry-aware keyframe extraction; and (ii) policy learning, which encodes the processed trajectories with an Intention Expert and fuses them with the pre-trained vision–language model context to condition an Action Expert for action generation. Experiments on real-world and simu- lated benchmarks demonstrate that Adaptor achieves state-of- the-art performance, improving success rates and efficiency over baselines. Moreover, the method exhibits low variance across operators with varying expertise, demonstrating robust cross-operator generalization. The Homepage is available at: https://rainyrobo.github.io/Adaptor/.