Factorizing Diffusion Policies for Observation Modality Prioritization
Omkar Deepak Patil, Prabin Kumar Rath, Kartikay Milind Pangaonkar, Eric Rosen, Nakul Gopalan
AI summary
Problem
Standard diffusion policies jointly condition on all sensory inputs, ignoring their task-dependent relevance and becoming brittle when specific modalities are noisy or shifted.
Approach
FDP splits learning into a base policy trained on prioritized modalities and a residual policy that captures the effect of all other modalities, enabling explicit prioritization orders during training.
Key results
- 15% absolute success rate improvement in low-data regimes across four simulated benchmarks
- 40% higher success rate under distribution shifts like visual distractors and camera occlusions
- Mathematical derivation of a factorized diffusion framework with a novel block-wise residual architecture
- Flexible modality prioritization as a tunable hyperparameter without manual weight tuning
Why it matters
Enables safer and more sample-efficient deployment of diffusion-based robot policies in real-world settings where sensory data is unreliable or shifts over time.
Abstract
Diffusion models have been extensively leveraged for learning robot skills from demonstrations. These poli- cies are conditioned on several observational modalities such as proprioception, vision and tactile. However, observational modalities have varying levels of influence for different tasks that diffusion polices fail to capture. In this work, we propose ‘Factorized Diffusion Policies’ abbreviated as FDP, a novel policy formulation that enables observational modalities to have differing influence on the action diffusion process by design. This results in learning policies where certain obser- vations modalities can be prioritized over the others such as vision>tactile or proprioception>vision. FDP achieves modality prioritization by factorizing the observa- tional conditioning for diffusion process, resulting in more performant and robust policies. Our factored approach shows strong performance improvements in low-data regimes with 15% absolute improvement in success rate on several simulated benchmarks when compared to a standard diffusion policy that jointly conditions on all input modalities. Moreover, our benchmark and real-world experiments show that factored policies are naturally more robust with 40% higher absolute success rate across several visuomotor tasks under distribution shifts such as visual distractors or camera occlusions, where existing diffusion policies fail catastrophically. FDP thus offers a safer and more robust alternative to standard diffusion policies for real-world deployment. Code and videos are available at https://fdp-policy.github.io/fdp-policy/.