Self-Supervised Domain Adaptation for Visual 3D Pose Estimation of Nano-Drone Racing Gates by Enforcing Geometric Consistency
Nicholas Carlotti, Michele Antonazzi, Elia Cereda, Mirko Nava, Nicola Basilico, Daniele Palossi, Alessandro Giusti
AI summary
Problem
Pre-trained simulation models fail to estimate drone racing gate poses accurately in the real world due to the sim-to-real gap, while collecting labeled real-world data is costly and impractical.
Approach
The method fine-tunes a simulation-pre-trained CNN on unlabeled real flight data by enforcing a state consistency loss that aligns predicted gate poses with the drone's onboard odometry between image pairs.
Key results
- Outperforms state-of-the-art unsupervised domain adaptation baselines
- Achieves low mean absolute error: position (x=26, y=28, z=10 cm) and orientation (ψ=13°)
- Improves position and orientation accuracy by 40% and 37% over baselines
- Operates in real-time at 33 FPS on the Crazyflie 2.1 nano-drone with only 10 minutes of flight data
Why it matters
Enables reliable, low-cost, real-time visual perception for autonomous nano-drones in competitive racing without requiring motion capture systems or manual labeling.
Abstract
We consider the task of visually estimating the relative pose of a drone racing gate in front of a nano- quadrotor, using a convolutional neural network pre-trained on simulated data to regress the gate’s pose. Due to the sim- to-real gap, the pre-trained model underperforms in the real world and must be adapted to the target domain. We propose an unsupervised domain adaptation (UDA) approach using only real image sequences collected by the drone flying an arbitrary trajectory in front of a gate; sequences are annotated in a self-supervised fashion with the drone’s odometry as measured by its onboard sensors. On this dataset, a state consistency loss enforces that two images acquired at different times yield pose predictions that are consistent with the drone’s odometry. Results indicate that our approach outperforms other SoA UDA approaches, has a low mean absolute error in position (x=26, y=28, z=10 cm) and orientation (ψ=13◦), an improvement of 40% in position and 37% in orientation over a baseline. The approach’s effectiveness is appreciable with as few as 10 minutes of real-world flight data and yields models with an inference time of 30.4ms (33 fps) when deployed aboard the Crazyflie 2.1 Brushless nano-drone.