DSPv2: Improved Dense Policy for Effective and Generalizable Whole-Body Mobile Manipulation
Yue Su, Chubin Zhang, Sijin Chen, Liufan Tan, Yansong Tang, JIANAN WANG, Xihui Liu
AI summary
Problem
Whole-body mobile manipulation policies struggle with complex multi-view observations, poor generalization to unseen environments or objects, and error amplification across high-dimensional robot components.
Approach
The method aligns sparse 3D spatial features with dense multi-view 2D semantic features using a Q-former, then processes the fused representation through a dense autoregressive action head that predicts whole-body trajectories bidirectionally.
Key results
- Surpasses existing whole-body policies in success rates across five real-world tasks
- Demonstrates strong generalization to unseen objects, lighting, layouts, and spatial arrangements
- Extends the Dense Policy paradigm to enable precise, coherent whole-body action generation
- Mitigates inter-component error amplification through bidirectional autoregressive prediction
Why it matters
Provides a scalable, generalizable policy framework that bridges the gap between simulation-trained manipulation and reliable real-world household robot deployment.
Abstract
Learning whole-body mobile manipulation via imitation is essential for generalizing robotic skills to diverse environments and complex tasks. However, this goal is hin- dered by significant challenges, particularly in effectively pro- cessing complex observation, achieving robust generalization, and generating coherent actions. To address these issues, we propose DSPv2, a novel policy architecture. DSPv2 introduces an effective encoding scheme that aligns 3D spatial features with multi-view 2D semantic features. This fusion enables the policy to achieve broad generalization while retaining the fine- grained perception necessary for precise control. Furthermore, we extend the Dense Policy paradigm to the whole-body mobile manipulation domain, demonstrating its effectiveness in generating coherent and precise actions for the whole-body robotic platform. Extensive experiments show that our method significantly outperforms existing approaches in both task performance and generalization ability. Project page is available at: https://selen-suyue.github.io/DSPv2Net/.