← Back ICRA 2026

Beyond the Majority: Long-Tail Imitation Learning for Robotic Manipulation

Junhong Zhu, Ji Zhang, Jingkuan Song, Lianli Gao, Heng Tao Shen

PDF

AI summary

Key figure (auto-extracted from paper)

Data scarcity in long-tail robotic datasets cripples spatial reasoning, but grafting tail objects onto head-task approaching phases restores performance without external demonstrations.

Long-tail learning Imitation learning Robotic manipulation Spatial reasoning Data augmentation Vision-language-action models

Problem

Generalist robot policies trained on imbalanced, long-tail demonstration datasets suffer severe performance drops on data-scarce tail tasks, while conventional re-sampling and augmentation techniques fail to address this due to a lack of variational diversity and physical plausibility.

Approach

The method isolates the target-approaching phase from data-rich head tasks, grafts objects from data-scarce tail tasks onto these trajectories, and co-trains the policy on the augmented dataset to restore spatial reasoning without external demonstrations.

Key results

Long-tail data scarcity directly impairs spatial reasoning during target approaching
Diagnosis of phase-wise failure modes using a new LIBERO-based benchmark
Significant tail-task performance gains in simulation and real-world experiments
Conventional re-sampling strategies prove ineffective for robotic policy learning

Why it matters

Enables reliable generalist robot policies for diverse real-world manipulation by solving a fundamental data imbalance problem without requiring costly new demonstrations.

Abstract

While generalist robot policies hold significant promise for learning diverse manipulation skills through im- itation, their performance is often hindered by the long-tail distribution of training demonstrations. Policies learned on such data, which is heavily skewed towards a few data-rich head tasks, frequently exhibit poor generalization when confronted with the vast number of data-scarce tail tasks. In this work, we conduct a comprehensive analysis of the pervasive long-tail challenge inherent in policy learning. Our analysis begins by demonstrating the inefficacy of conventional long-tail learning strategies (e.g., re-sampling) for improving the policy’s perfor- mance on tail tasks. We then uncover the underlying mechanism for this failure, revealing that data scarcity on tail tasks directly impairs the policy’s spatial reasoning capability. To overcome this, we introduce Approaching-Phase Augmentation (APA), a simple yet effective scheme that transfers knowledge from data-rich head tasks to data-scarce tail tasks without requiring external demonstrations. Extensive experiments in both simulation and real-world manipulation tasks demonstrate the effectiveness of APA. Our code and demos are publicly available at: https://mldxy.github.io/Project-VLA-long-tail/.

Index terms

Imitation Learning Learning from Demonstration