AutoPercep: A Pipeline for Onboard Neighbor Position Estimation Toward Large-Scale Swarm Robotics
Ruiheng Wu, Simay Atasoy Bingöl, Oliver Deussen, Heiko Hamann, Iain D. Couzin, Andreagiovanni Reina, Liang Li
AI summary
Problem
Centralized coordination and traditional localization methods struggle with scalability, computational load, and cost for large robot swarms, while existing vision-based approaches require expensive, large-scale labeled datasets and heavy processing hardware.
Approach
The pipeline automatically synchronizes onboard camera footage with motion-capture ground truth to generate a large labeled dataset in minutes, then trains lightweight neural networks to predict relative neighbor positions directly from camera images.
Key results
- Collected 10,000+ labeled image-position pairs in under 10 minutes without manual annotation
- ResNet-18 achieved high accuracy (MSE: 0.095 m, angular error: 2.24°) running at ~10 FPS on a Raspberry Pi 4B
- Network trained on five robots successfully generalized to seven-robot deployments without retraining
- Validated in sequential leader-follower tasks, demonstrating real-time coordination feasibility
Why it matters
Provides a scalable, low-cost, and computationally efficient solution for onboard perception, enabling large-scale swarm robotics to operate autonomously beyond controlled laboratory environments.
Abstract
Autonomous mobile robots must know each other’s positions to coordinate their actions and motion. Be- yond collision avoidance, relative position estimation is essen- tial for spatial coordination tasks such as collective motion, leader–follower dynamics, or formation control. To overcome the scalability and resilience issues of centralized orchestrators that transmit real-time positional information to every robot, we study mechanisms of onboard vision sensing. Conventional localization methods, such as SLAM, are typically too com- putationally demanding for real-time use on small, resource- constrained mobile robots. Vision-based neural networks offer a promising alternative but often require large, high-quality datasets that are expensive to collect. We present AutoPercep, a pipeline that automatically generates training data and trains a lightweight neural network to estimate neighbor positions. Robots capture camera images that are automatically labeled using ground-truth data from a motion-capture system. In our experiments, AutoPercep collected over 10,000 high-quality images within 10 minutes and trained a neural network in about 1 hour, which could be deployed on Raspberry Pi 4B–based robots for onboard neighbour detection. Moreover, we show that a network trained on five robots generalizes to seven- robot deployments. We finally evaluate the trained model in a sequential leader-follower case study. Our end-to-end pipeline demonstrates the feasibility and low cost of onboard, vision- based neighbor perception, supporting scalability to large robot swarms and opening opportunities for deployment beyond laboratory settings. The code for training and evaluation is available at https://github.com/preon7/autopercep