← Back ICRA 2026

PPGuide: Steering Diffusion Policies with Performance Predictive Guidance

Zixing Wang, Devesh Jha, Ahmed H. Qureshi, Diego Romeres

PDF

AI summary

Key figure (auto-extracted from paper)

PPGuide significantly improves the robustness and success rates of pre-trained diffusion policies at inference time by using a lightweight, self-supervised classifier to steer actions away from failure modes.

Diffusion policies Policy steering Multiple instance learning Robotic manipulation Inference-time guidance Self-supervised learning

Problem

Diffusion policies for robotic manipulation often suffer from compounding errors over long horizons, leading to task failure, while existing correction methods require expensive dense rewards, world models, or extensive expert data.

Approach

PPGuide uses an attention-based multiple instance learning model to automatically identify success- and failure-relevant action chunks from sparse trajectory outcomes, then trains a lightweight classifier to provide real-time gradient guidance during the diffusion denoising process.

Key results

Novel self-supervised MIL framework for localizing critical action chunks without manual annotation
Consistent success rate improvements across diverse Robomimic and MimicGen manipulation tasks
Eliminates need for dense rewards or auxiliary world models by relying solely on sparse binary outcomes
Minimal inference-time computational overhead via an alternating gradient guidance schedule

Why it matters

It enables more robust and reliable robotic manipulation using pre-trained diffusion policies without costly data collection or complex auxiliary models, benefiting researchers and practitioners in robot learning.

Abstract

Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic ma- nipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation- action chunks from the policy’s rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust 1Department of Computer Science at Purdue University, IN 47907, USA. {wang5389, ahqureshi}@purdue.edu 2Contribution was conducted while the author was at Mitsubishi Electric Research Laboratories. devesh.dkj@gmail.com 3Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 USA. romeres@merl.com actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.

Index terms

Imitation Learning Sensorimotor Learning Representation Learning