VDS-Nav: Volumetric Depth-Based Safe Navigation for Aerial Robots�Bridging the Sim-To-Real Gap
Van Huyen Dang, Adrian Redder, Huy Pham, Andriy Sarabakha, Erdal Kayacan
AI summary
Problem
Vision-based end-to-end navigation for aerial robots struggles with the sim-to-real gap, often relying on latent space encoders that lose information or using reward functions that poorly correlate with raw sensor data.
Approach
The authors propose VDS-Nav, an end-to-end deep reinforcement learning policy that maps sequences of raw depth images directly to velocity and yaw commands, using a novel reward function based on the dot product between velocity and depth pixel vectors to enforce safety constraints.
Key results
- Outperforms latent-space baseline in simulation success rate
- Achieves real-world deployment with performance closely matching simulation
- Enables zero-shot sim-to-real transfer without information loss
- Validates improved learning through volumetric depth sequence inputs
Why it matters
Provides a robust, deployable navigation framework for resource-constrained aerial robots operating in cluttered environments, advancing practical vision-based autonomy.
Abstract
End-to-end navigation via deep reinforcement learn- ing has become a key approach for vision-based tasks. However, the sim-to-real gap remains a challenge, especially for aerial robots, where policies trained in simulation often fail in real- world environments. In this work, we propose a novel navigation paradigm – volumetric depth-based safe navigation (VDS-Nav), which trains a policy to infer linear velocities and yaw rate directly from a sequence of depth images, bypassing the need for a pre-trained latent space encoder. We enhance safety with a depth-based reward design, enabling the seamless incorporation of system constraints via logarithmic barrier function methods. Most importantly, using explicit sensor information in our reward design leads to seamless sim-to-real transfer by strengthening the correlation between state-action pairs and received rewards. To evaluate the effectiveness of VDS-Nav, we compare it to a baseline that first trains a variational autoencoder to encode depth images into a latent space for policy training. The simulation results show that VDS-Nav outperforms the baseline in terms of success rate. Furthermore, real-world experiments validate the policy, with real-time performance closely matching simulation results, suggesting an effective sim-to-real transfer.