PegasusFlow: Parallel Rolling-Denoising Score Sampling for Robot Diffusion Planner Flow Matching
Lei Ye, Haibo Gao, Peng Xu, Zhelin Zhang, Wei Zhang, Junqi Shan, Ao Zhang, Ruyi Zhou, Zongquan Deng, Liang Ding
AI summary
Problem
Diffusion-based robot planners currently rely on costly and impractical imitation learning from expert demonstrations, while existing direct score estimation methods lack the computational efficiency and parallel scalability needed for real-world deployment.
Approach
The authors introduce a parallel rolling-denoising framework that uses Weighted Basis Function Optimization (WBFO) and an asynchronous parallel simulation architecture to directly estimate trajectory score gradients from environmental interactions, completely bypassing expert data.
Key results
- Parallel score sampling framework enabling pure score-matching training without expert demonstrations
- Weighted Basis Function Optimization (WBFO) algorithm achieving faster convergence and superior sample efficiency over MPPI
- Structured noise sampling schema combining Latin Hypercube Sampling, hierarchical ramp scheduling, and RL warm-start
- 100% success rate and 18% speedup in challenging barrier-crossing tasks compared to baselines
Why it matters
Provides a scalable, data-efficient pathway for training diffusion-based robot planners, making complex terrain navigation and real-time control more accessible without costly expert datasets.
Abstract
Diffusion models offer powerful generative ca- pabilities for robot trajectory planning, yet their practical deployment on robots is hindered by a critical bottleneck: reliance on imitation learning from expert demonstrations. This paradigm is problematic as it is often impractical to produce high quality data for specialized robots, and it creates an inefficient, theoretically suboptimal training pipeline. To overcome this, we introduce PegasusFlow, a parallel rolling- denoising framework that enables direct sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data. Our core innovation is a sampling algorithm called Weighted Basis Function Opti- mization (WBFO), which leverages spline basis representations to achieve superior sample efficiency and faster convergence compared to traditional methods like MPPI. The framework is embedded within a scalable, asynchronous parallel simula- tion architecture that supports massively parallel rollouts for efficient data collection. Extensive experiments on trajectory optimization and robotic navigation tasks demonstrate that our approach, particularly Action-Value WBFO (AVWBFO) combined with a reinforcement learning warm-start, signifi- cantly outperforms baselines. In a challenging barrier-crossing task, our method achieved a 100% success rate and was 18% faster than the next-best method, validating its effectiveness for complex terrain locomotion planning. https://masteryip. github.io/pegasusflow.github.io/