Research Analyzer
← Back ICRA 2026

SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning

Yu Zhang, Yuqi Xie, Huihan Liu, Rutav Shah, Michael Wan, Linxi Fan, Yuke Zhu

PDF

AI summary

SCIZOR automatically filters low-quality state-action pairs in large-scale robot datasets using self-supervised progress estimation and deduplication, boosting imitation learning and VLA model performance by an average of 15.4% with less data.
imitation learning data curation self-supervised learning vision-language-action models transition-level filtering robot policy training

Problem

Large-scale imitation learning datasets contain noisy, suboptimal, and redundant state-action pairs that degrade policy performance, but existing curation methods rely on costly manual annotations or operate at coarse trajectory levels, missing fine-grained quality signals.

Approach

SCIZOR uses a self-supervised task progress predictor to identify and remove suboptimal transitions, combined with a joint state-action deduplication module to filter redundant patterns, enabling scalable, annotation-free transition-level data curation.

Key results

  • First self-supervised transition-level curation framework for robotics
  • Filters suboptimal transitions via self-supervised task progress estimation
  • Removes redundant data using joint state-action similarity clustering
  • Achieves 15.4% average performance improvement across imitation learning and VLA benchmarks

Why it matters

Enables scalable, annotation-free data curation that significantly boosts the efficiency and performance of large-scale robot learning and vision-language-action models.

Abstract

Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, the first self- supervised transition-level curation framework that requires no annotations and scales to large-scale datasets to improve the performance of imitation learning policies and modern Vision-Language-Action (VLA) models. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redun- dant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progres- sion, and a deduplication module operating on joint state- action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies and modern VLA models to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://scizor-icra2026.github.io

Index terms

Big Data in Robotics and Automation Imitation Learning Learning from Demonstration

Related papers