NOVA: Navigation Via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments
Alessandro Saviolo, Giuseppe Loianno
AI summary
Problem
Autonomous aerial target tracking in unstructured, GPS-denied environments fails when relying on global localization or premapped scenes, as feature-based methods break down under occlusion, rapid motion, and degraded lighting.
Approach
NOVA formulates perception and control entirely in the target’s reference frame, fusing lightweight object detection, stereo depth completion, and visual-inertial estimation to feed a nonlinear model predictive controller with real-time, map-free obstacle avoidance.
Key results
- Agile target tracking at speeds exceeding 50 km/h
- Stable navigation across urban mazes, forest trails, and indoor-outdoor transitions
- Fully onboard operation using only a stereo camera and IMU
- Consistent, repeatable performance under severe lighting changes and GPS loss
Why it matters
Provides a deployable, infrastructure-free solution for high-speed aerial tracking in search-and-rescue, inspection, and swarm applications where GPS and maps are unavailable.
Abstract
Autonomous aerial target tracking in unstructured and global position system (GPS)-denied environments remains a fundamental challenge in robotics. Many existing methods rely on motion capture systems, premapped scenes, or feature-based localization to ensure safety and control, limiting their deploy- ment in real-world conditions. We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation using only a stereo camera and an inertial measurement unit (IMU). Rather than constructing a global map or relying on absolute localization, NOVA formulates perception, estimation, and control entirely in the target’s reference frame. A tightly integrated stack combines a lightweight object detector with stereo depth completion, followed by histogram-based filtering to infer robust target distances under occlusion and noise. These measurements feed a visual-inertial state estimator that recovers the full 6-DoF pose of the robot relative to the target. A nonlinear model predictive controller (NMPC) plans dynamically feasible trajectories in the target frame. To ensure safety, high-order control barrier functions (CBFs) are constructed online from a compact set of high-risk collision points extracted from depth, enabling real-time obstacle avoidance without maps or dense representations. We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss and severe lighting changes that disrupt feature-based localization. Each experiment is repeated multiple times under similar conditions to assess resilience, showing consistent and reliable performance. NOVA achieves agile target following at speeds exceeding 50 km/h. These results show that high-speed, vision-based tracking is possible in the wild using only onboard sensing, with no reliance on external localization or assumptions on the environment structure.