EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation
Samarth Chopra, Alexander McMoil, Benjamin Carnovale, Evan Sokolson, Rajkumar Kubendran, Samuel Dickerson
AI summary
Problem
State-of-the-art robotic manipulators are prohibitively expensive and complex, while existing vision-language-action models struggle to generalize reliably in novel or cluttered real-world environments.
Approach
The authors fine-tune a 7B vision-language model to jointly predict discrete and continuous action chunks, using a novel ensembler that monitors prediction disagreement to dynamically adjust planning horizons and trigger safe replanning.
Key results
- A $300 open-source 6-DOF manipulator with 0.2 kg payload and 10 mm repeatability
- AdaHorizon ensembler that dynamically adjusts action planning based on discrete-continuous prediction disagreement
- 49% and 34.9% higher in-distribution and out-of-distribution success rates than prior methods on real-world tasks
- Competitive performance on the LIBERO simulation benchmark with inference rates up to 108.4 Hz
Why it matters
Democratizes access to robotic foundation models for home users, educators, and researchers by drastically reducing hardware costs without sacrificing real-world performance.
Abstract
While Vision–Language–Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6- DOF manipulator that can be assembled for $300, capable of modest payloads and workspaces. A single unified model jointly outputs discrete and continuous actions, and our adaptive- horizon ensembler monitors motion uncertainty to trigger on- the-fly replanning for safe, reliable operation. On LIBERO, Ev- erydayVLA matches state-of-the-art success rates, and in real- world tests it outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution. By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA democratizes access to a robotic foundation model, and paves the way for economical use in homes and research labs alike. Experiment videos and more details can be found on our project page: https://everydayvla.github.io/