DiSPo: Diffusion-SSM Based Policy Learning for Coarse-To-Fine Action Discretization
Nayoung Oh, Jaehyeong Jang, Moonkyeong Jung, Daehyung Park
AI summary
Problem
Traditional imitation learning methods struggle to learn and reproduce manipulation skills across varying temporal granularities, often requiring extensive fine-grained data or failing to adapt to user-intended control scales.
Approach
DiSPo integrates a diffusion process with a Mamba-based state space model, using a novel step-scaling mechanism to dynamically adjust discretization granularity and generate fine-grained actions from coarse demonstrations.
Key results
- Up to 81% improvement in task success rates on coarse-to-fine benchmarks
- Enhanced inference efficiency through on-demand coarse-to-fine action generation
- Successful real-world deployment on precision manipulation tasks
- Online granularity modulation via a learned step-scale factor predictor
Why it matters
It provides a scalable, data-efficient framework for robots to master precise manipulation skills from coarse demonstrations, advancing generalizable policy learning for real-world robotic applications.
Abstract
We aim to solve the problem of learning user- intended granular skills from multi-granularity demonstrations. Traditional learning-from-demonstration methods typically rely on extensive fine-grained data, interpolation techniques, or dynamics models, which are ineffective at encoding or decoding the diverse granularities inherent in skills. To overcome it, we introduce a novel diffusion-state space model (SSM) based policy (DiSPo) that leverages an SSM, Mamba, to learn from diverse coarse demonstrations and generate multi-scale actions. Our proposed step-scaling mechanism in Mamba is a key innovation, enabling memory-efficient learning, flexible granularity adjustment, and robust representation of multi- granularity data. DiSPo outperforms state-of-the-art baselines on coarse-to-fine benchmarks, achieving up to an 81% im- provement in success rates while enhancing inference efficiency by generating inexpensive coarse motions where applicable. We validate DiSPo’s scalability and effectiveness on real- world manipulation scenarios. Code and Videos are available at https://robo-dispo.github.io.