Reinforcement Learning for Active Perception in Autonomous Navigation
Grzegorz Malczyk, Mihir Kulkarni, Kostas Alexis
AI summary
Problem
Autonomous navigation systems typically rely on passive, fixed cameras, overlooking how active, purpose-driven sensing could enhance situational awareness and safety in complex, unknown environments.
Approach
The authors develop an end-to-end reinforcement learning policy that jointly optimizes robot motion and camera orientation using a multi-objective reward that balances goal progress, collision avoidance, and voxel-based information gain.
Key results
- Higher target-reaching success rates in simulation versus fixed-camera baselines
- Intrinsic exploratory behaviors and improved map completeness via information-driven rewards
- Successful sim-to-real deployment on a physical quadrotor in cluttered 3D environments
- Open-sourced framework for reproducible active perception navigation
Why it matters
Provides a practical pathway for safer, more adaptive aerial robot autonomy in critical applications like search-and-rescue and infrastructure inspection.
Abstract
This paper addresses the challenge of active per- ception within autonomous navigation in complex, unknown environments. Revisiting the foundational principles of active perception, we introduce an end-to-end reinforcement learning framework in which a robot must not only reach a goal while avoiding obstacles, but also actively control its onboard camera to enhance situational awareness. The policy receives observations comprising the robot state, the current depth frame, and a particularly local geometry representation built from a short history of depth readings. To couple collision- free motion planning with information-driven active camera control, we augment the navigation reward with a voxel- based information metric. This enables an aerial robot to learn a robust policy that balances goal-directed motion with exploratory sensing. Extensive evaluation demonstrates that our strategy achieves safer flight compared to using fixed, non-actuated camera baselines while also inducing intrinsic exploratory behaviors.