← Back ICRA 2026

Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation

Zhan Gao, Guang Yang, Amanda Prorok

PDF

AI summary

Key figure (auto-extracted from paper)

Jointly optimizing agent navigation policies and reconfigurable environment layouts significantly boosts multi-agent navigation performance by enabling environments to structurally guide and de-conflict agents.

Co-optimization multi-agent navigation reconfigurable environments decentralized control reinforcement learning environment design

Problem

Classical multi-agent navigation treats the environment as a fixed constraint, often causing deadlocks and inefficiencies. This paper addresses the gap of jointly optimizing both agent policies and environment configurations to maximize system-wide navigation performance.

Approach

The authors propose a model-free coordinated framework that alternates between training a decentralized navigation policy via reinforcement learning and optimizing a reconfigurable obstacle layout via unsupervised learning.

Key results

Formulation of a system-level agent-environment co-optimization problem
Development of an alternating coordinated algorithm using RL and unsupervised learning
Formal convergence analysis proving tracking of local minima in a time-varying non-convex setting
Simulation and real-world experiments showing optimized environments structurally guide and de-conflict agents

Why it matters

This framework provides a new paradigm for adaptive logistics, search-and-rescue, and urban planning by demonstrating that reconfigurable environments are as critical as intelligent policies for efficient multi-agent coordination.

Abstract

This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in a cluttered environment, where we assume that the layout of the environment is reconfigurable. By introducing two sub-objectives—multi-agent navigation and environment optimization—we propose an agent- environment co-optimization problem and develop a coordinated algorithm that alternates between these sub-objectives to search for an optimal synthesis of agent actions and environment config- urations; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between the agents, the environment and their performance therein, we lever- age policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum solution of an associated time-varying non-convex optimization problem. Experiments corroborate theoretical findings and show the benefits of co-optimization. Interestingly, the results also in- dicate that optimized environments can offer structural guidance to de-conflict agents in motion.

Index terms

Multi-Robot Systems Distributed Robot Systems Optimization and Optimal Control Co-Design