← Back ICRA 2026

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

Gabriele Calzolari, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos

PDF

AI summary

Key figure (auto-extracted from paper)

A hierarchical framework combining a graph attention policy with a safety filter enables efficient, collision-free autonomous exploration in cluttered environments without platform-specific tuning.

Safe Reinforcement Learning Graph Neural Networks Autonomous Exploration Safety Filters Platform-Agnostic Robotics

Problem

Autonomous exploration in obstacle-rich spaces requires balancing high efficiency with strict safety guarantees, yet standard reinforcement learning policies lack explicit safety constraints and real-world adaptability.

Approach

The method uses a graph neural network trained via reinforcement learning to propose next-waypoints, paired with a deterministic safety filter that overrides any infeasible action with the closest collision-free alternative.

Key results

A novel safety filter that dynamically overrides unsafe policy actions with the closest feasible alternative
An attention-enhanced GNN policy that extracts exploration-relevant features from a custom graph-based environment representation
A potential-field-shaped reward function that balances frontier information gain with safety penalty minimization
Validated efficient and safe exploration across 100 simulated environments and physical lab experiments on a Unitree Go1 quadruped robot

Why it matters

Enables reliable deployment of learning-based exploration policies on real-world robotic platforms operating in safety-critical, cluttered environments.

Abstract

Autonomous exploration of obstacle-rich spaces requires strategies that ensure efficiency while guaranteeing safety against collisions with obstacles. This paper investigates a novel platform-agnostic reinforcement learning framework that integrates a graph neural network-based policy for next- waypoint selection, with a safety filter ensuring safe mobility. Specifically, the neural network is trained using reinforcement learning through the Proximal Policy Optimization (PPO) algorithm to maximize exploration efficiency while minimizing safety filter interventions. Henceforth, when the policy proposes an infeasible action, the safety filter overrides it with the closest feasible alternative, ensuring consistent system behavior. In addition, this paper introduces a reward function shaped by a potential field that accounts for both the agent’s proximity to unexplored regions and the expected information gain from reaching them. The proposed framework combines the adapt- ability of reinforcement learning-based exploration policies with the reliability provided by explicit safety mechanisms. This feature plays a key role in enabling the deployment of learning- based policies on robotic platforms operating in real-world environments. Extensive evaluations in both simulations and experiments performed in a lab environment demonstrate that the approach achieves efficient and safe exploration in cluttered spaces.

Index terms

Reinforcement Learning Deep Learning Methods AI-Enabled Robotics