GRATE: A Graph Transformer-Based Deep Reinforcement Learning Approach for Time-Efficient Autonomous Robot Exploration
Haozhan Ni, Jingsong Liang, Chenyu He, Yuhong Cao, Guillaume Adrien Sartoretti
AI summary
Problem
Current RL-based exploration methods suffer from limited graph reasoning and neglect robot kinematics, yielding spatially optimal but temporally inefficient and physically infeasible paths.
Approach
GRATE combines a Graph Transformer to capture local and global graph dependencies for waypoint selection with a Kalman filter to smooth outputs into kinodynamically feasible trajectories.
Key results
- Up to 21.5% reduction in travel distance and 21.3% in exploration time versus SOTA baselines
- Generation of kinodynamically feasible paths via Kalman filter smoothing
- Successful validation in high-fidelity Gazebo simulations and real-world ground robot tests
- Outperforms conventional (TARE, FAEL, HPHS) and learning-based (ARiADNE) planners
Why it matters
Enables faster, physically realistic autonomous exploration for ground robots in complex environments.
Abstract
Autonomous robot exploration (ARE) is the pro- cess of a robot autonomously navigating and mapping an unknown environment. Recent Reinforcement Learning (RL)- based approaches typically formulate ARE as a sequential decision-making problem defined on a collision-free informative graph. However, these methods often demonstrate limited rea- soning ability over graph-structured data. Moreover, due to the insufficient consideration of robot motion, the resulting RL poli- cies are generally optimized to minimize travel distance, while neglecting time efficiency. To overcome these limitations, we propose GRATE, a Deep Reinforcement Learning (DRL)-based approach that leverages a Graph Transformer to effectively capture both local structure patterns and global contextual dependencies of the informative graph, thereby enhancing the model’s reasoning capability across the entire environment. In addition, we deploy a Kalman filter to smooth the waypoint outputs, ensuring that the resulting path is kinodynamically fea- sible for the robot to follow. Experimental results demonstrate that our method exhibits better exploration efficiency (up to 21.5% in distance and 21.3% in time to complete exploration) than state-of-the-art conventional and learning-based baselines in various simulation benchmarks. We also validate our planner in real-world scenarios.