Research Analyzer
← Back ICRA 2026

ACG: Action Coherence Guidance for Flow-Based Vision-Language-Action Models

Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo

PDF

AI summary

Key figure (auto-extracted from paper)
A training-free test-time guidance method that significantly boosts success rates in robotic manipulation by steering flow-based VLA models away from temporally incoherent action distributions.
Action Coherence Flow Matching Vision-Language-Action Models Test-Time Guidance Robotic Manipulation Inference-Time Guidance

Problem

Flow-based Vision-Language-Action models trained via imitation learning memorize noise from human demonstrations, causing unstable actions and trajectory drift that degrade performance in fine-grained manipulation.

Approach

ACG replaces self-attention maps with identity matrices to generate an incoherent vector field, then guides sampling in the opposite direction to enforce temporal consistency without retraining.

Key results

  • Consistently improves success rates across RoboCasa, DexMimicGen, and real-world SO-101 benchmarks
  • Delivers substantial gains on fine manipulation tasks (+23.1% button pressing, +11.8% insertion, +28.8% real-world pick-and-place)
  • Outperforms vanilla models, action smoothing, ensembling, and classifier-free guidance without additional training
  • Reduces trajectory drift and action instability during critical manipulation moments

Why it matters

Enables reliable, precise robotic manipulation with existing flow-based VLA policies through a simple, plug-and-play inference-time enhancement.

Abstract

Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instruc- tions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catas- trophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks.

Index terms

Imitation Learning Learning from Demonstration Machine Learning for Robot Control

Related papers