Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven DLO Manipulation
Georgios Kamaras, Subramanian Ramamoorthy
AI summary
Problem
Adapting robotic policies to the specific physical properties of deformable linear objects is hindered by the simulation-reality gap and high-dimensional dynamics. Existing methods lack an integrated approach to calibrate simulators and train robust policies for direct real-world deployment.
Approach
The framework infers posterior distributions of DLO physical parameters from real-world visual trajectories using likelihood-free inference. These distributions guide domain randomization during simulation-based reinforcement learning, enabling zero-shot policy deployment in reality.
Key results
- Infers fine-grained physical properties (length, Young's modulus) from visual trajectories
- Object-specific posteriors improve domain randomization efficiency and policy convergence
- Achieves zero-shot sim-to-real transfer without real-world fine-tuning
- Demonstrates robust adaptation across DLOs with varying stiffness and length
Why it matters
Provides a scalable, data-efficient pipeline for robots to adapt to new deformable objects, accelerating deployment in surgery, manufacturing, and automation.
Abstract
We present an integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception. Working with a parameterised set of DLOs, we use likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which we can approximately simulate the behaviour of each specific DLO. We use these posteriors for domain randomisation while training, in simulation, object-specific visuomotor policies (i.e. assuming only visual and proprioceptive sensory) for a DLO reaching task, using model-free reinforcement learning. We demonstrate the utility of this approach by deploying sim- trained DLO manipulation policies in the real world in a zero-shot manner, i.e. without any further fine-tuning. In this context, we evaluate the capacity of a prominent LFI method to perform fine classification over the parametric set of DLOs, using only visual and proprioceptive data obtained in a dynamic manipulation trajectory. We then study the implications of the resulting domain distributions in sim-based policy learning and real-world performance.