Research Analyzer
← Back ICRA 2026

COBALT: Crowdsourcing Robot Learning Via Cloud-Based Teleoperation with Smartphones

Ayush Agarwal, Ansh Gandhi, Jeremy Collins, Omar Rayyan, Aryan Sarswat, Ranjani Koushik, Masoud Moghani, Ajay Uday Mandlekar, Animesh Garg

PDF

AI summary

Key figure (auto-extracted from paper)
Cloud-based teleoperation using off-the-shelf smartphones can scale to hundreds of concurrent users with low latency, and structured training significantly improves data quality for robot learning.
Teleoperation Robot Learning Crowdsourcing Imitation Learning Cloud Computing Smartphone Control

Problem

Scaling imitation learning for robotics is bottlenecked by the scarcity of large-scale, high-quality demonstration data, as physical data collection is costly and existing teleoperation platforms lack scalability and accessibility.

Approach

COBALT is a cloud-based teleoperation platform that uses vectorized simulations and low-latency streaming to enable concurrent remote control via off-the-shelf devices, augmented by real-time quality metrics and a structured training curriculum.

Key results

  • Supports 256 concurrent clients across 8 GPUs with sub-100 ms latency at 20 Hz
  • Phone-based teleoperation performs comparably to or better than specialized hardware
  • Structured training curriculum significantly reduces task reset rates and execution times
  • Crowdsourced pilot dataset of 7500+ demonstrations successfully trains state-of-the-art imitation learning policies

Why it matters

It democratizes robot learning by enabling scalable, low-cost data collection from global crowds, accelerating the development of capable manipulation robots.

Abstract

The scarcity of large-scale, high-quality demon- stration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized en- vironments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An in-memory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency. We demonstrate concurrent support for 256 clients across 8 GPUs, underscoring the system’s ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large- scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset’s quality by training state-of-the-art imitation learning algorithms. Please visit cobalt-teleop.github.io for more details.

Index terms

Telerobotics and Teleoperation Data Sets for Robot Learning Imitation Learning

Related papers