← Back ICRA 2026

A MARL Approach for Connectivity-Aware Search and Rescue in Urban Environments

Andrés Meseguer Valenzuela

PDF

AI summary

Key figure (auto-extracted from paper)

A closed-loop simulation framework successfully coordinates UGV navigation and UAV relay positioning to maintain low-latency 5G connectivity while traversing occluded urban environments.

Connectivity-aware autonomy Multi-agent reinforcement learning Urban search and rescue UAV/UGV coordination 5G network simulation Closed-loop robotics

Problem

Dense urban environments cause frequent line-of-sight to non-line-of-sight transitions that degrade 5G connectivity, threatening the reliable telemetry and control needed for heterogeneous search-and-rescue missions.

Approach

The authors integrate a physics-based urban digital twin with ROS2 orchestration, a PPO multi-agent reinforcement learning controller, and a 5G link evaluation pipeline to jointly optimize robot mobility and network performance.

Key results

End-to-end closed-loop simulation framework coupling robotics middleware and 5G KPIs
PPO multi-agent policy coordinates two UGVs and three UAVs for navigation and relay support
UGVs successfully navigate toward target, reducing distance to hazard zone from 27.9 m to 1.55 m
Connectivity maintained with 4.88 ms mean latency, 2.2–3.5% packet loss, and sparse outages in building-dense areas

Why it matters

Enables systematic diagnosis and optimization of mobility-communication coupling for future urban robotic teams operating under strict connectivity constraints.

Abstract

This work presents a closed-loop experimental framework for connectivity-aware urban search and rescue (SAR) using heterogeneous unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs). The setup couples a physics-based urban digital twin in NVIDIA Isaac Sim with Robot Operating System 2 (ROS2) orchestration, a Proximal Policy Optimization (PPO) multi-agent reinforcement learning (MARL) controller, and a fifth-generation (5G) link evaluation pipeline based on ns-3/5G-LENA key performance indicators (KPIs). Two UGVs execute mission-directed navigation toward a hazard region, while two UAV relays and a gNB-like aerial anchor adapt their positions to sustain end-to-end service under line-of-sight and non-line-of-sight transitions induced by urban occlusions. Preliminary simulation results validate end-to-end operability and provide quantitative evidence of simultaneous mission progress and network continuity. Across a representative episode, the minimum distance to the hazard- region center decreases from 27.9 m to 1.55 m (final 1.80 m), while latency remains in a low regime (mean 4.88 ms, p95 8.17 ms). Packet loss is bounded (mean 3.5% and 2.2% for the two UGVs), and outages are sparse (101 steps over 9000), even during partial traversal of building-dense areas. The platform enables systematic diagnosis of mobility–connectivity coupling and supports transfer-oriented refinement of relay control and coordination policies. MOTIVATION AND CONTRIBUTION Search-and-rescue (SAR) missions increasingly rely on heterogeneous robotic teams under tight time constraints[1]. Unmanned ground vehicles (UGVs) perform inspection and mapping, while unmanned aerial vehicles (UAVs) provide sensing and communication support. In dense urban environments, building occlusions and “urban canyon” propagation trigger frequent transitions between line-of-sight (LoS) and non-line-of-sight (NLoS) conditions[2], causing throughput drops, bursty outages, and latency inflation. Mission execution therefore depends on timely delivery of camera streams, odometry, and map updates, not solely on local control. This article contributes an end-to-end, closed-loop simulation framework that makes this coupling measurable and actionable. A city-scale simulation runs in NVIDIA Isaac Sim and is orchestrated via Robot Operating System 2 (ROS2), while a Proximal Policy Optimization (PPO) multi- agent reinforcement learning (MARL) policy coordinates two UGV explorers and UAV communication assets (relays and a gNB-like anchor). Network key performance indicators (KPIs) are evaluated alongside mobility, enabling systematic diagnosis of mobility–connectivity failure modes. Andrés Meseguer-Valenzuela is with Instituto Tecnológico de Informática (ITI), Spain (e-mail: ameseguer@iti.es) SYSTEM AND SIMULATION OVERVIEW The experimental setup is conceived as a closed-loop system that couples physics-based urban simulation, middleware orchestration, multi-agent decision making, and radio-performance instrumentation to study mobility– connectivity coupling in SAR-style operations. A city-scale three-dimensional (3D) simulation is executed in NVIDIA Isaac Sim, where two UGVs act as mission executors and three UAVs provide connectivity support through two relay platforms and a gNB-like aerial anchor. The simulated environment is defined as a stage with structured road corridors and building blocks that induce frequent LoS and NLoS transitions, thereby creating mobility- driven fluctuations in link quality representative of dense urban deployments. The scenario addressed by the system is a target-reaching SAR navigation problem under communication constraints. At the start of each episode, both UGVs are initialized within a designated start region, depicted as a green zone in Fig. 1, and are required to traverse the urban street network toward a predefined target region that represents the mission objective (e.g., a hazard or inspection area). Mission progress is therefore expressed in terms of the UGVs’ convergence to, and eventual entry into, the target zone while operating in the presence of occlusions that intermittently degrade end-to-end connectivity. Figure 1. Simulation environment The UAVs are tasked with supporting this progression by adapting their positions to maintain network service as the UGVs move deeper into occluded areas, effectively shaping a relay-assisted aerial backbone. ROS 2 provides the middleware layer for time-aligned state streaming and command dispatch during live episodes, enabling continuous interaction between simulation, control, and measurement components. Decision making is implemented via a PPO multi-agent reinforcement learning policy, complemented by a supervisory layer that enforces safety constraints and limits physically implausible behaviors in early-stage trials. Connectivity is assessed through a fifth-generation (5G) link evaluation pipeline based on ns-3 and 5G-LENA simulations, which A MARL approach for connectivity-aware search and rescue in urban environments Andrés Meseguer Valenzuela ICRA2026 Late Breaking Results Poster presented at 2026 IEEE International Conference on Robotics and Automation (ICRA 2026) June 1-5, 2026. Vienna, Austria

Index terms

Multi-Robot Systems Cooperating Robots Networked Robots