← Back ICRA 2026

CorrectManip: A Data-Driven Closed-Loop Framework for Autonomous Skill Learning with Failure Recovery

Shiwen Li, Zhen Yang, Zuofu Wang, Bin Ma, ye tengju, Shiyu Zhao, Junbo Chen, Kaicheng Yu

PDF

AI summary

Key figure (auto-extracted from paper)

CorrectManip boosts robotic policy generalization in unseen environments by 45.22% through a closed-loop system that autonomously learns from execution failures and adaptively generates targeted training data.

Robotic skill learning closed-loop framework failure recovery test-time optimization sim-to-real transfer autonomous generalization

Problem

Current simulation-based robotic skill learning relies on static or handcrafted scenarios and lacks mechanisms to analyze failures or iteratively refine training data, severely limiting generalization in open-world settings.

Approach

The framework closes the loop between policy execution and data generation using EvoGen, which creates failure-targeted training scenes, and TTO, which analyzes test-time failures to adjust rewards and fine-tune policies online.

Key results

45.22% average success rate improvement over baseline methods in unseen environments
85.98% average success rate across diverse manipulation tasks
Successful sim-to-real transfer demonstrated on Unitree H1 and G1 robots
Ablation studies validate the complementary roles of EvoGen and TTO in driving continual policy improvement

Why it matters

Provides a scalable, human-free pathway for robots to autonomously adapt to complex real-world environments, bridging the gap between simulation training and reliable open-world deployment.

Abstract

Simulation-based training offers an efficient paradigm for robotic skill learning, providing scalable data generation while reducing reliance on costly hardware trials and manual data collection. However, existing methods that rely on handcrafted scenarios fail to fully cover the complexity of open-world variations and neglect the critical insights offered by inevitable failures in unseen environments. As a result, current policies struggle to achieve robust generalization, hindering de- ployment in open-world settings. This highlights the need for a continuous learning framework that enables robots to reflect on failures and iteratively refine policies in a targeted way. In this paper, we propose CorrectManip, a novel data-driven closed- loop framework that enables the policy to continuously improve performance in unseen environments by learning from failures. Existing methods remain confined to single-loop adaptation, ad- dressing policy errors in static environments or indiscriminately scaling data without targeting failure modes, CorrectManip *Equal contribution. †Work done during their visiting at Autolab, Westlake University. ‡Co-corresponding authors. closes the loop both at the policy recovery and environment generation: EvoGen, a self-evolving generator, and TTO, a test-time optimization module. EvoGen adaptively generates training data to strengthen policy performance, while TTO analyzes execution failures to provide fine-grained optimization signals. Together, TTO exposes policy weakness and EvoGen converts them into task-relevant training data, forming a closed feedback loop that drives continual policy improvement and stronger generalization. Extensive experiments across diverse tasks demonstrate that CorrectManip improves the average success rate in unseen environments by 45.22% over baseline methods. These results validate the complementary roles of TTO and EvoGen in enhancing generalization. Furthermore, we showcase sim-to-real transfer ability on Unitree H1 and Unitree G1. Demos are available here.

Index terms

Autonomous Agents Deep Learning Methods Task Planning