← Back ICRA 2026

DiSPo: Diffusion-SSM Based Policy Learning for Coarse-To-Fine Action Discretization

Nayoung Oh, Jaehyeong Jang, Moonkyeong Jung, Daehyung Park

PDF

AI summary

Key figure (auto-extracted from paper)

DiSPo enables robots to generate precise, multi-granularity actions from coarse demonstrations by combining diffusion models with a step-scaled state space model, achieving up to 81% higher success rates than baselines.

diffusion policy state space model imitation learning multi-granularity robotic manipulation coarse-to-fine discretization

Problem

Traditional imitation learning methods struggle to learn and reproduce manipulation skills across varying temporal granularities, often requiring extensive fine-grained data or failing to adapt to user-intended control scales.

Approach

DiSPo integrates a diffusion process with a Mamba-based state space model, using a novel step-scaling mechanism to dynamically adjust discretization granularity and generate fine-grained actions from coarse demonstrations.

Key results

Up to 81% improvement in task success rates on coarse-to-fine benchmarks
Enhanced inference efficiency through on-demand coarse-to-fine action generation
Successful real-world deployment on precision manipulation tasks
Online granularity modulation via a learned step-scale factor predictor

Why it matters

It provides a scalable, data-efficient framework for robots to master precise manipulation skills from coarse demonstrations, advancing generalizable policy learning for real-world robotic applications.

Abstract

We aim to solve the problem of learning user- intended granular skills from multi-granularity demonstrations. Traditional learning-from-demonstration methods typically rely on extensive fine-grained data, interpolation techniques, or dynamics models, which are ineffective at encoding or decoding the diverse granularities inherent in skills. To overcome it, we introduce a novel diffusion-state space model (SSM) based policy (DiSPo) that leverages an SSM, Mamba, to learn from diverse coarse demonstrations and generate multi-scale actions. Our proposed step-scaling mechanism in Mamba is a key innovation, enabling memory-efficient learning, flexible granularity adjustment, and robust representation of multi- granularity data. DiSPo outperforms state-of-the-art baselines on coarse-to-fine benchmarks, achieving up to an 81% im- provement in success rates while enhancing inference efficiency by generating inexpensive coarse motions where applicable. We validate DiSPo’s scalability and effectiveness on real- world manipulation scenarios. Code and Videos are available at https://robo-dispo.github.io.

Index terms

Machine Learning for Robot Control Industrial Robots Assembly