Exploiting Vulnerabilities: Universal Adversarial Attacks on Vision-Language-Action Models in Robotics
Songhua Yang, Ziyu Liu, Yuanwei Liu, Xuetao Li, Xuanye Fei, He Huang, Zheng WANG, Miao Li
AI summary
Problem
VLA models are increasingly used for direct physical control but lack systematic security evaluations, leaving them highly vulnerable to adversarial attacks that traditional methods cannot address.
Approach
The authors propose a Universal Adversarial Object (UAO) generated via a black-box, multi-level optimization framework that simultaneously disrupts trajectory planning, task execution, and motion control to create a physically realizable, viewpoint-robust attack.
Key results
- Reduces average task success rates by 31.2%-39.9% across 13 tasks for Pi0 and RDT models
- Drives success rates to near zero (1.9%-2.1%) in complex environments
- Achieves 81.4%-82.2% sim-to-real transfer rate on a physical dual-arm robot
- Maintains effectiveness even when visible in only a single camera view
Why it matters
Highlights critical security vulnerabilities in emerging VLA robotics systems, emphasizing the urgent need for robust adversarial training and safety protocols before real-world deployment.
Abstract
Recently, Vision-Language-Action (VLA) models have revolutionized robotic manipulation by seamlessly inte- grating visual perception, language understanding, and action generation in an end-to-end learning framework. However, since these models are designed to interact directly with the physical world and humans, their security is critical, and even small vulnerabilities can lead to catastrophic failures. In this work, we propose the Universal Adversarial Object, a sphere with optimized surface texture that significantly degrades task success rates when placed within the robot’s field of view. Specifically, our approach introduces a multi-level attack framework that jointly disrupts trajectory planning, task execution, and action control. We validate our method in both simulated and real-world robotic settings. Experimental results demonstrate that the adversarial object reduces the average task success rates by 31.2%-39.9% for two representative VLA models (Pi0 and RDT), with success rates dropping to near zero in complex scenarios.