Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation
Zhan Gao, Guang Yang, Amanda Prorok
AI summary
Problem
Classical multi-agent navigation treats the environment as a fixed constraint, often causing deadlocks and inefficiencies. This paper addresses the gap of jointly optimizing both agent policies and environment configurations to maximize system-wide navigation performance.
Approach
The authors propose a model-free coordinated framework that alternates between training a decentralized navigation policy via reinforcement learning and optimizing a reconfigurable obstacle layout via unsupervised learning.
Key results
- Formulation of a system-level agent-environment co-optimization problem
- Development of an alternating coordinated algorithm using RL and unsupervised learning
- Formal convergence analysis proving tracking of local minima in a time-varying non-convex setting
- Simulation and real-world experiments showing optimized environments structurally guide and de-conflict agents
Why it matters
This framework provides a new paradigm for adaptive logistics, search-and-rescue, and urban planning by demonstrating that reconfigurable environments are as critical as intelligent policies for efficient multi-agent coordination.
Abstract
This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in a cluttered environment, where we assume that the layout of the environment is reconfigurable. By introducing two sub-objectives—multi-agent navigation and environment optimization—we propose an agent- environment co-optimization problem and develop a coordinated algorithm that alternates between these sub-objectives to search for an optimal synthesis of agent actions and environment config- urations; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between the agents, the environment and their performance therein, we lever- age policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum solution of an associated time-varying non-convex optimization problem. Experiments corroborate theoretical findings and show the benefits of co-optimization. Interestingly, the results also in- dicate that optimized environments can offer structural guidance to de-conflict agents in motion.