← Back ICRA 2026

Self-Supervised Domain Adaptation for Visual 3D Pose Estimation of Nano-Drone Racing Gates by Enforcing Geometric Consistency

Nicholas Carlotti, Michele Antonazzi, Elia Cereda, Mirko Nava, Nicola Basilico, Daniele Palossi, Alessandro Giusti

PDF

AI summary

Key figure (auto-extracted from paper)

A self-supervised domain adaptation method leveraging drone odometry and geometric consistency loss accurately transfers simulation-trained pose estimation models to real-world nano-drone racing gates, outperforming existing unsupervised baselines.

Self-supervised learning Domain adaptation 3D pose estimation Nano-drone Geometric consistency Sim-to-real transfer

Problem

Pre-trained simulation models fail to estimate drone racing gate poses accurately in the real world due to the sim-to-real gap, while collecting labeled real-world data is costly and impractical.

Approach

The method fine-tunes a simulation-pre-trained CNN on unlabeled real flight data by enforcing a state consistency loss that aligns predicted gate poses with the drone's onboard odometry between image pairs.

Key results

Outperforms state-of-the-art unsupervised domain adaptation baselines
Achieves low mean absolute error: position (x=26, y=28, z=10 cm) and orientation (ψ=13°)
Improves position and orientation accuracy by 40% and 37% over baselines
Operates in real-time at 33 FPS on the Crazyflie 2.1 nano-drone with only 10 minutes of flight data

Why it matters

Enables reliable, low-cost, real-time visual perception for autonomous nano-drones in competitive racing without requiring motion capture systems or manual labeling.

Abstract

We consider the task of visually estimating the relative pose of a drone racing gate in front of a nano- quadrotor, using a convolutional neural network pre-trained on simulated data to regress the gate’s pose. Due to the sim- to-real gap, the pre-trained model underperforms in the real world and must be adapted to the target domain. We propose an unsupervised domain adaptation (UDA) approach using only real image sequences collected by the drone flying an arbitrary trajectory in front of a gate; sequences are annotated in a self-supervised fashion with the drone’s odometry as measured by its onboard sensors. On this dataset, a state consistency loss enforces that two images acquired at different times yield pose predictions that are consistent with the drone’s odometry. Results indicate that our approach outperforms other SoA UDA approaches, has a low mean absolute error in position (x=26, y=28, z=10 cm) and orientation (ψ=13◦), an improvement of 40% in position and 37% in orientation over a baseline. The approach’s effectiveness is appreciable with as few as 10 minutes of real-world flight data and yields models with an inference time of 30.4ms (33 fps) when deployed aboard the Crazyflie 2.1 Brushless nano-drone.

Index terms

Deep Learning for Visual Perception Transfer Learning Deep Learning Methods