← Back ICRA 2026

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Kai Li, Shiyu Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

An end-to-end visuomotor policy trained via privileged imitation learning outperforms classic modular approaches in multi-robot laser tag by directly mapping images to actions without explicit state estimation or depth mapping.

Visuomotor policy Multi-robot coordination Privileged imitation learning End-to-end control Autonomous laser tag Real-world deployment

Problem

Classic modular approaches for robot shooting games struggle with limited observability, reliance on depth mapping or global localization, and dependencies on inter-robot communication, which limit scalability and real-world deployment.

Approach

The authors train a privileged state-based teacher policy using multi-agent reinforcement learning and distill it into a vision-based student policy that directly maps monocular images and depth heatmaps to velocity commands, enhanced by a permutation-invariant feature extractor.

Key results

Decentralized end-to-end policy eliminates reliance on explicit state estimation, global localization, and inter-robot communication
Achieves 16.7% higher hit accuracy and 6% improved collision avoidance compared to classic modular methods
Permutation-invariant feature extractor and depth-heatmap inputs significantly boost robustness over standard baselines
Successfully deployed on real-world multi-robot systems with limited onboard computational resources

Why it matters

Provides a scalable, hardware-efficient framework for decentralized multi-robot coordination that can be adapted to real-world applications like autonomous drone interception and dynamic combat scenarios.

Abstract

In this paper, we study multi-robot laser tag, a simplified yet practical shooting-game-style task. Classic modular approaches on these tasks face challenges such as limited observability and reliance on depth mapping and inter- robot communication. To overcome these issues, we present an end-to-end visuomotor policy that maps images directly to robot actions. We train a high-performing teacher policy with multi-agent reinforcement learning and distill its knowledge into a vision-based student policy. Technical designs, including a permutation-invariant feature extractor and depth–heatmap input, improve performance over standard architectures. Our policy outperforms classic methods by 16.7% in hitting accu- racy and 6% in collision avoidance, and is successfully deployed on real robots. Code will be released publicly1.

Index terms

Sensor-based Control Cooperating Robots Sensorimotor Learning