Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale
Tobias Thomas Jülg, Pierre Krack, Seongjin Bien, Yannik Blei, Khaled Gamal, Ken Nakahara, Johannes Hechtl, Roberto Calandra, Wolfram Burgard, Florian Walter
AI summary
Problem
Traditional robotics frameworks and simulators lack the flexibility and scalability needed for modern, data-centric robot learning workflows, creating bottlenecks for training and transferring Vision-Language-Action models to physical robots.
Approach
The authors present RCS, a lean, wrapper-based ecosystem that unifies hardware and MuJoCo simulation under a single Gymnasium-compatible Python API and a high-performance C++ backend to streamline policy training and deployment.
Key results
- Modular wrapper-based architecture supporting Python and C++ robot control
- Cross-embodiment evaluation across four physical robot setups and matched MuJoCo simulations
- Extensive benchmarking of Octo, OpenVLA, and π0 policies on a standardized picking task
- Demonstration that mixing synthetic and real-world data significantly boosts real-world policy performance
Why it matters
It provides robotics researchers with a scalable, low-overhead toolchain that accelerates the development, training, and deployment of foundation-model-based robot policies.
Abstract
Vision-Language-Action models (VLAs) mark a major shift in robot learning. They replace specialized archi- tectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning. In this machine learning-focused workflow that is centered around models and scalable training, traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments. In this work, we close this gap by introducing Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large- scale generalist policies. At its core, RCS features a modular and easily extensible layered architecture with a unified interface for simulated and physical robots, facilitating sim-to-real transfer. Despite its minimal footprint and dependencies, it offers a complete feature set, enabling both real-world experiments and large-scale training in simulation. Our contribution is twofold: First, we introduce the architecture of RCS and explain its design principles. Second, we evaluate its usability and performance along the development cycle of VLA and RL policies. Our experiments also provide an extensive evaluation of Octo, OpenVLA, and π0 on multiple robots and shed light on how simulation data can improve real-world policy performance. Our code, datasets and videos are available at https://robotcontrolstack.github.io