Risk-Aware Routing for a Robot in a Shared Dynamic Environment
Elena Stracca, Giorgio Grioli, Lucia Pallottino, Paolo Salaris
AI summary
Problem
Mobile robots navigating dynamic, human-shared spaces often experience performance delays due to uncertain human presence, but existing planners fail to optimally adapt routes based on real-time observations.
Approach
The problem is modeled as a stochastic shortest path with recourse and solved via a Markov Decision Process to compute an offline policy that accounts for human distribution probabilities and encounter severity, followed by an online adaptation mechanism to prevent routing loops.
Key results
- MDP policy explicitly integrates re-planning options, visibility, and encounter severity
- Generated routes outperform reactive and advanced state-of-the-art planners
- Low computational complexity enables scalability to large environments
- Online adaptation mechanism successfully prevents cyclic routing behaviors
Why it matters
Enables safer and more efficient deployment of autonomous mobile robots in dynamic human-robot shared spaces like warehouses and hospitals.
Abstract
This paper explores the challenge of optimal routing for a mobile robot navigating a dynamic and shared human environment. The primary goal is to minimize the risk of performance degradation during motion, such as delays in completing tasks due to the need for safe or acceptable human- robot encounters. The problem is formulated as a graph whose edge costs become progressively known only as the robot moves through the environment. We model this problem as a Markov Decision Process (MDP), enabling an offline evaluation of the expected cost of alternative routes based on statistical information about human spatial distributions and possible observations at each intersection. This compact state representation scales linearly with the number of intersections in the map. Since the memoryless property of the MDP may induce loops during online execution, we compute an offline policy and introduce an online policy adaptation mechanism to prevent cyclic behaviors. Exten- sive simulations across environments of different complexity, and using data collected from real-world experiments, demonstrate that our approach outperforms reactive and advanced state-of- the-art planners in terms of either performance or scalability.