Cross-Observability Learning for Vehicle Routing Problems
Ruifan Liu, Hyo-Sang Shin, Antonios Tsourdos
Abstract
This study seeks towards a better understanding of multi-vehicle routing problems (VRPs) under restricted observability. Unlike most prior research that assumes full knowledge of tasks and vehicles, this paper addresses VRPs where each vehicle’s observation is confined to the k-nearest neighbourhood. Vehicles make decisions based on localized policies in a decentralized manner. We theoretically demon- strate that for the imitation policy, the upper bound of the optimality gap diminishes as the neighbourhood range expands. Subsequently, we employed a multi-agent cross-observability policy optimization (MACOPO) algorithm to solve the VRPs with restricted observability. The algorithm optimizes a cross- entropy term by leveraging a fully observable expert to guide the training. Empirical results supported both the theoretical findings and the effectiveness of the multi-agent learning algorithm.