← Back ICRA 2026

Planning-Guided Diffusion Policy Learning for Contact-Rich Bimanual Object Reorientation

Xuanlin Li, Tong Zhao, Bo Ai, Xinghao Zhu, Jiuguang Wang, Tao Pang, Kuan Fang

PDF

AI summary

Key figure (auto-extracted from paper)

LIDE enables robust contact-rich bimanual manipulation of diverse objects by training a task-conditioned diffusion policy on scalable, planner-generated synthetic data.

Bimanual Manipulation Diffusion Policy Contact-Rich Sim-to-Real Motion Planning

Problem

Collecting high-quality demonstrations for complex bimanual tasks is costly in the real world, and existing model-based planners lack the generalization needed for unseen object geometries.

Approach

A model-based motion planner generates massive synthetic trajectories in simulation, which are then used to train a diffusion policy that utilizes point cloud observations and delta joint action predictions.

Key results

Successful reorientation of diverse objects in both simulation and real-world settings
Generalization to out-of-distribution objects including irregular containers and soft materials
Improved sim-to-real transfer via Flying Point Augmentation and delta joint action prediction
Higher success rates than the underlying motion planner in simulated tasks

Why it matters

Allows robotic systems to manipulate bulky or heavy objects that cannot be directly grasped, expanding capabilities for warehouse logistics and home services.

Abstract

Contact-rich bimanual manipulation involves pre- cise coordination of two arms to change object states through strategically selected contacts and motions. Due to the inherent complexity of these tasks, acquiring sufficient demonstration data and training policies that generalize to unseen scenarios remains a largely unresolved challenge. Building on recent advances in planning through contacts, we introduce Planning- Guided Diffusion Policy Learning (LIDE), an approach that effectively learns to solve contact-rich bimanual manipulation tasks by leveraging model-based motion planners to generate demonstration data in high-fidelity physics simulation. Through efficient planning in randomized environments, our approach generates large-scale and high-quality synthetic motion trajec- tories for tasks involving diverse objects and transformations. We then train a task-conditioned diffusion policy via behavior cloning using these demonstrations. To reduce the sim-to-real gap, we propose a set of designs in feature extraction, action prediction, and data augmentation that enable learning robust prediction of smooth action sequences and generalization to unseen scenarios. Through experiments in both simulation and the real world, we demonstrate that our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties.

Index terms

Bimanual Manipulation Imitation Learning Deep Learning in Grasping and Manipulation