Highly Flexible Task Planner for Robots in Dynamic Environments
Miguel Guzman-Merino, Jörn Plönnigs
AI summary
Problem
Construction sites are highly non-deterministic with scarce or inaccurate information, making it difficult to automate complex multi-agent tasks. Existing planning systems fail in real-world settings because they depend on controlled environments or unavailable data.
Approach
The authors propose a Multi-Agent Proximal Policy Optimization (MAPPO) system that trains diverse robot teams across varying dynamic scenarios. By combining local observations with a centralized critic using augmented states, the framework learns to command agents independently of environmental constraints.
Key results
- MAPPO-based task planning framework for dynamic construction environments
- Training methodology using variable agent teams and diverse scenarios
- Augmented state design integrating local observations and global task states
- Roadmap for ROS/Gazebo simulation validation and low-level action integration
Why it matters
Provides a pathway to deploy adaptable multi-robot systems on real construction sites where conditions change rapidly and data is unreliable.
Abstract
Construction site environments are highly non deterministic scenarios under constant changes. Complex tasks are usually required in these scenarios and multi agent systems have been probed as the flexible solution to solve them. Nevertheless, the uncertainty in the environments often makes the available information inaccurate, incomplete or difficult to integrate in multi agent systems. To successfully automate complex processes in construction environments it is necessary to overcome the barrier imposed by the lack of accurate information. The research challenge here presented is the coordination of multi agent systems in non-deterministic environments. In this proposal, a Multi Agent Proximal Policy Optimization system (MAPPO) is proposed to create the necessary flexible framework. Various policy networks associated with different types of agents are trained over different scenarios. Different teams of agents are also proposed during the training process. With this approach it is intended to create a framework able to command different teams of agents independently from the constraints imposed by the information of the environment.