← Back IROS 2024

Cross-Observability Learning for Vehicle Routing Problems

Ruifan Liu, Hyo-Sang Shin, Antonios Tsourdos

PDF

Abstract

This study seeks towards a better understanding of multi-vehicle routing problems (VRPs) under restricted observability. Unlike most prior research that assumes full knowledge of tasks and vehicles, this paper addresses VRPs where each vehicle’s observation is confined to the k-nearest neighbourhood. Vehicles make decisions based on localized policies in a decentralized manner. We theoretically demon- strate that for the imitation policy, the upper bound of the optimality gap diminishes as the neighbourhood range expands. Subsequently, we employed a multi-agent cross-observability policy optimization (MACOPO) algorithm to solve the VRPs with restricted observability. The algorithm optimizes a cross- entropy term by leveraging a fully observable expert to guide the training. Empirical results supported both the theoretical findings and the effectiveness of the multi-agent learning algorithm.

Index terms

Planning Scheduling and Coordination Reinforcement Learning Intelligent Transportation Systems