Uncertainty Comes for Free: Human-In-The-Loop Policies with Diffusion Models
Zhanpeng He, Yifeng Cao, Matei Ciocarlie
AI summary
Problem
Continuous human monitoring in human-in-the-loop robot deployment is labor-intensive and impractical for scaling, yet existing methods lack efficient, training-free uncertainty estimation to strategically trigger assistance at deployment time.
Approach
The method computes an uncertainty metric from the diffusion policy's denoising vector field to decide when to request teleoperation, and uses the resulting intervention data to efficiently fine-tune the policy without requiring expert interaction during training.
Key results
- Achieves 100% task success with fewer human intervention steps than baselines
- Requires no human interaction during training and minimal deployment overhead
- Enables efficient policy fine-tuning using targeted teleoperation data
- Demonstrates robustness across distribution shift, partial observability, and multi-modality
Why it matters
It enables scalable, practical human-in-the-loop robot deployment by minimizing operator fatigue while maintaining high autonomy and success rates.
Abstract
Human-in-the-loop robot deployment has gained significant attention in both academia and industry as a semi- autonomous paradigm that enables human operators to intervene and adjust robot behaviors at deployment time, improving success rates. However, continuous human monitoring and intervention can be highly labor-intensive and impractical when deploying a large number of robots. To address this limitation, we propose a method that allows diffusion policies to actively seek human assistance only when necessary, reducing reliance on constant human oversight. To achieve this, we leverage the generative process of diffusion policies to compute an uncertainty-based metric based on which the autonomous agent can decide to request operator assistance at deployment time, without requiring any operator interaction during training. Additionally, we show that the same method can be used for efficient data collection for fine-tuning diffusion policies in order to improve their autonomous performance. Experimental results from simulated and real-world environments demonstrate that our approach enhances policy performance during deployment for a variety of scenarios.