IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning
Ryan Hoque, Ajay Uday Mandlekar, Caelan Garrett, Ken Goldberg, Dieter Fox
Abstract
Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data aug- mentation system for robot control that autonomously produces a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39× with only 10 human interventions. Videos and more results are available at https: //sites.google.com/view/intervengen2024. I. I N T RO D U C T I O N Imitation Learning (IL) from human demonstrations is a promising paradigm for training robot policies. One approach is to collect a set of offline task demonstrations via human teleoperation [1,2] and employ behavior cloning (BC) [3] to train robot policies via supervised learning, where the labels are robot actions. There have been recent efforts to scale this approach by collecting thousands of demonstrations using hundreds of human operator hours and training high-capacity neural networks on the large-scale data [4–8]. However, IL policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data [9]. As an example, consider a policy that makes decisions based on object pose observations. A common source of distribution shift in the real world is object pose estimation error, which can occur due to a wide range of factors such as sensor noise, occlusion, network delay, and model misspecification. This can cause inaccuracy in the robot’s belief of where critical objects are located in the environment, leading the robot to visit states outside the training distribution that result in poor policy performance. One approach to addressing distribution shift is to collect a large set of demonstrations under diverse conditions and hope that agents trained on this data can generalize. However, human teleoperation data is notoriously difficult to collect due to the human time, effort, and financial cost required [4–8]. An alternative approach is Interactive IL, including DAg- ger [9] and its variants [10–12], in which humans can intervene during robot execution and demonstrate recovery behaviors to help the robot return to the support of the training 1UC Berkeley, 2NVIDIA, *Equal contribution. Human Intervention Synthetic Interventions IntervenGen Fig. 1: Overview. IntervenGen automatically generates corrective interven- tional data from a small number of human interventions, with coverage across both diverse scene configurations and policy mistake distributions. Here, the robot mistakenly believes the peg is at the position in red and requires demonstration of recovery behavior toward the true peg position. distribution. Subsequent training on these corrections can increase policy robustness and performance both theoretically and in practice [9]. However, interactive IL imposes even more burden on the human supervisors than behavior cloning, as the human must continuously monitor robot task execution and intervene when they see fit, typically over multiple rounds of interleaved data collection and policy training. Moreover, a significant amount of recovery data may be required to adequately cover the distribution of mistakes the policy may make. We raise the following question: do we actually need to have a human operator intervene every single time a policy makes a mistake? MimicGen [13], a recently proposed data generation system, raises an intriguing possibility: a large dataset of synthetically generated demonstrations derived from a small set of human demonstrations (typically 100× smaller or more) can produce performant robot policies. The system’s key insight is that similar object-centric manipulation behaviors can be applied in new contexts by appropriately transforming demonstrated behavior to the new object frame. Inspired by this insight, we propose a data generation system for interventional data (see Fig. 1). With a small set of corrective interventions from a human operator, the system autonomously generates data with significantly higher coverage of the distribution of potential policy mistakes. Such a system has a broad range of applications such as improving policy success rates on a task of interest, making policies robust to errors in perception, and more broadly, acting as a domain randomization [14] procedure to aid in sim-to- real transfer of IL policies without requiring additional data collection from a human supervisor. In this work, we focus 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 14-18, 2024. Abu Dhabi, UAE 979-8-3503-7769-9/24/$31.00 ©2024 IEEE 2840