A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning
Yufei Jiang, Yuanzhu Zhan, Harsh vardhan Gupta, Chinmay Mahendra Borde, Junyi Geng
AI summary
Problem
Traditional modular UAV path planning suffers from latency and suboptimal performance, while end-to-end learning methods lack dynamical feasibility, require large datasets, and struggle with sim-to-real transfer.
Approach
A self-supervised pipeline that jointly trains a depth perception network and a differentiable minimum-snap trajectory optimizer using a 3D cost map for collision guidance and a neural time-allocation network.
Key results
- Self-supervised 3D path planning pipeline eliminating need for expert labels
- 30.90% reduction in control effort with competitive tracking performance
- Differentiable minimum-snap optimizer guaranteeing dynamic feasibility
- Neural time-allocation network enhancing planning efficiency and optimality
Why it matters
Provides a robust, interpretable, and data-efficient navigation framework for UAVs operating in complex 3D environments under strict SWAP constraints.
Abstract
While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and subopti- mal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self- supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we in- corporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretabil- ity. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effec- tiveness and robustness. Our method achieves a 30.90% reduc- tion in control effort while maintaining competitive tracking performance compared with state-of-the-art.