← Back ICRA 2026

EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation

Samarth Chopra, Alexander McMoil, Benjamin Carnovale, Evan Sokolson, Rajkumar Kubendran, Samuel Dickerson

PDF

AI summary

Key figure (auto-extracted from paper)

EveryDayVLA combines a $300 manipulator with an adaptive-horizon vision-language-action model to achieve state-of-the-art real-world manipulation performance at a fraction of traditional costs.

Vision-Language-Action Low-cost robotics Adaptive planning Robotic manipulation Foundation models Real-world generalization

Problem

State-of-the-art robotic manipulators are prohibitively expensive and complex, while existing vision-language-action models struggle to generalize reliably in novel or cluttered real-world environments.

Approach

The authors fine-tune a 7B vision-language model to jointly predict discrete and continuous action chunks, using a novel ensembler that monitors prediction disagreement to dynamically adjust planning horizons and trigger safe replanning.

Key results

A $300 open-source 6-DOF manipulator with 0.2 kg payload and 10 mm repeatability
AdaHorizon ensembler that dynamically adjusts action planning based on discrete-continuous prediction disagreement
49% and 34.9% higher in-distribution and out-of-distribution success rates than prior methods on real-world tasks
Competitive performance on the LIBERO simulation benchmark with inference rates up to 108.4 Hz

Why it matters

Democratizes access to robotic foundation models for home users, educators, and researchers by drastically reducing hardware costs without sacrificing real-world performance.

Abstract

While Vision–Language–Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6- DOF manipulator that can be assembled for $300, capable of modest payloads and workspaces. A single unified model jointly outputs discrete and continuous actions, and our adaptive- horizon ensembler monitors motion uncertainty to trigger on- the-fly replanning for safe, reliable operation. On LIBERO, Ev- erydayVLA matches state-of-the-art success rates, and in real- world tests it outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution. By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA democratizes access to a robotic foundation model, and paves the way for economical use in homes and research labs alike. Experiment videos and more details can be found on our project page: https://everydayvla.github.io/

Index terms

AI-Enabled Robotics Deep Learning Methods Learning from Demonstration