CorrectManip: A Data-Driven Closed-Loop Framework for Autonomous Skill Learning with Failure Recovery
Shiwen Li, Zhen Yang, Zuofu Wang, Bin Ma, ye tengju, Shiyu Zhao, Junbo Chen, Kaicheng Yu
AI summary
Problem
Current simulation-based robotic skill learning relies on static or handcrafted scenarios and lacks mechanisms to analyze failures or iteratively refine training data, severely limiting generalization in open-world settings.
Approach
The framework closes the loop between policy execution and data generation using EvoGen, which creates failure-targeted training scenes, and TTO, which analyzes test-time failures to adjust rewards and fine-tune policies online.
Key results
- 45.22% average success rate improvement over baseline methods in unseen environments
- 85.98% average success rate across diverse manipulation tasks
- Successful sim-to-real transfer demonstrated on Unitree H1 and G1 robots
- Ablation studies validate the complementary roles of EvoGen and TTO in driving continual policy improvement
Why it matters
Provides a scalable, human-free pathway for robots to autonomously adapt to complex real-world environments, bridging the gap between simulation training and reliable open-world deployment.
Abstract
Simulation-based training offers an efficient paradigm for robotic skill learning, providing scalable data generation while reducing reliance on costly hardware trials and manual data collection. However, existing methods that rely on handcrafted scenarios fail to fully cover the complexity of open-world variations and neglect the critical insights offered by inevitable failures in unseen environments. As a result, current policies struggle to achieve robust generalization, hindering de- ployment in open-world settings. This highlights the need for a continuous learning framework that enables robots to reflect on failures and iteratively refine policies in a targeted way. In this paper, we propose CorrectManip, a novel data-driven closed- loop framework that enables the policy to continuously improve performance in unseen environments by learning from failures. Existing methods remain confined to single-loop adaptation, ad- dressing policy errors in static environments or indiscriminately scaling data without targeting failure modes, CorrectManip *Equal contribution. †Work done during their visiting at Autolab, Westlake University. ‡Co-corresponding authors. closes the loop both at the policy recovery and environment generation: EvoGen, a self-evolving generator, and TTO, a test-time optimization module. EvoGen adaptively generates training data to strengthen policy performance, while TTO analyzes execution failures to provide fine-grained optimization signals. Together, TTO exposes policy weakness and EvoGen converts them into task-relevant training data, forming a closed feedback loop that drives continual policy improvement and stronger generalization. Extensive experiments across diverse tasks demonstrate that CorrectManip improves the average success rate in unseen environments by 45.22% over baseline methods. These results validate the complementary roles of TTO and EvoGen in enhancing generalization. Furthermore, we showcase sim-to-real transfer ability on Unitree H1 and Unitree G1. Demos are available here.