Efficient Hierarchical Reinforcement Learning for Mapless Navigation with Predictive Neighbouring Space Scoring
Yan Gao, Jing Wu, Xintong Yang, Ze Ji
Abstract
Solving reinforcement learning (RL)-based mapless navigation tasks is challenging due to their sparse reward and long decision horizon nature. Hierarchical reinforcement learn- ing (HRL) has the ability to leverage knowledge at different ab- stract levels and is thus preferred in complex mapless navigation tasks. However, it is computationally expensive and inefficient to learn navigation end-to-end from raw high-dimensional sensor data, such as Lidar or RGB cameras. The use of subgoals based on a compact intermediate representation is therefore preferred for dimension reduction. This work proposes an efficient HRL- based framework to achieve this with a novel scoring method, named Predictive Neighbouring Space Scoring (PNSS). The PNSS model estimates the explorable space for a given position of interest based on the current robot observation. The PNSS values for a few candidate positions around the robot pro- vide a compact and informative state representation for subgoal selection. We study the effects of different candidate position layouts and demonstrate that our layout design facilitates higher performances in longer-range tasks. Moreover, a penalty term is introduced in the reward function for the high-level (HL) policy, so that the subgoal selection process takes the performance of the low-level (LL) policy into consideration. Comprehensive evaluations demonstrate that using the proposed PNSS module consistently improves performances over the use of Lidar only or Lidar and encoded RGB features. Note to Practitioners—This paper seeks to improve robot mapless navigation capabilities where the robot is expected to navigate to a goal location without knowing the map of the environment. This ability is highly demanded in many appli- cations that require autonomous operations in unstructured environments, including both indoor and outdoor scenarios, involving tasks such as service robots for domestic and public environments, logistics in industrial warehouses, urban search and rescue missions, and disaster relief efforts, where detailed and accurate maps are difficult to obtain in advance. In this work, we focus on reinforcement learning-based mapless navigation. It is known that such methods struggle in complex long-range tasks, e.g. stuck in a local region by multiple objects. Therefore, this paper proposes a novel mapless navigation method inspired by human navigation behaviours. We enable a robot to split a long-range navigation task into multiple segments, by selecting and navigating to short-term Manuscript received: Month, Day, Year; Revised Month, Day, Year; Accepted Month, Day, Year. This paper was recommended for publication by Editor Editor A. Name upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by (organizations/grants which supported the work.) 1School of Engineering, Cardiff University, Cardiff, UK {gaoy84, jiz1, yangx66}@cardiff.ac.uk (Corresponding author: Ze Ji) 2School of Computer Science and Informatics, Cardiff University, Cardiff, UK wuj11@cardiff.ac.uk goals. These subgoals are selected each time from a number of candidate positions located around the robot. The process stops when the robot reaches the final target location. When selecting a short-term goal, we use a deep neural network to predict the openness around each candidate subgoal position, named the Predictive Neighbouring Space Scoring (PNSS), from raw images and Lidar scans. In addition, we study the effects of different arrangements of candidate subgoal locations and select the optimal one. Experiments conducted in photo-realistic simulation environments demonstrate the effectiveness of our method, showcasing superior performance over baselines. It is worth noting that our agent is only trained in domestic environments using the iGibson simulator. For applications in other environments, additional training in more representative settings specific to corresponding scenarios will be necessary. In the future, our intention is to validate our methods in complex real-world environments and narrow the simulation-to-reality gap for long-horizon navigation tasks.