Research Analyzer
← Back ICRA 2026

NOVA: Navigation Via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments

Alessandro Saviolo, Giuseppe Loianno

PDF

AI summary

Key figure (auto-extracted from paper)
NOVA enables high-speed, fully onboard aerial target tracking and collision avoidance in GPS-denied environments using only a stereo camera and IMU, without maps or external infrastructure.
Object-centric navigation high-speed target tracking GPS-denied environments visual-inertial odometry control barrier functions onboard perception

Problem

Autonomous aerial target tracking in unstructured, GPS-denied environments fails when relying on global localization or premapped scenes, as feature-based methods break down under occlusion, rapid motion, and degraded lighting.

Approach

NOVA formulates perception and control entirely in the target’s reference frame, fusing lightweight object detection, stereo depth completion, and visual-inertial estimation to feed a nonlinear model predictive controller with real-time, map-free obstacle avoidance.

Key results

  • Agile target tracking at speeds exceeding 50 km/h
  • Stable navigation across urban mazes, forest trails, and indoor-outdoor transitions
  • Fully onboard operation using only a stereo camera and IMU
  • Consistent, repeatable performance under severe lighting changes and GPS loss

Why it matters

Provides a deployable, infrastructure-free solution for high-speed aerial tracking in search-and-rescue, inspection, and swarm applications where GPS and maps are unavailable.

Abstract

Autonomous aerial target tracking in unstructured and global position system (GPS)-denied environments remains a fundamental challenge in robotics. Many existing methods rely on motion capture systems, premapped scenes, or feature-based localization to ensure safety and control, limiting their deploy- ment in real-world conditions. We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation using only a stereo camera and an inertial measurement unit (IMU). Rather than constructing a global map or relying on absolute localization, NOVA formulates perception, estimation, and control entirely in the target’s reference frame. A tightly integrated stack combines a lightweight object detector with stereo depth completion, followed by histogram-based filtering to infer robust target distances under occlusion and noise. These measurements feed a visual-inertial state estimator that recovers the full 6-DoF pose of the robot relative to the target. A nonlinear model predictive controller (NMPC) plans dynamically feasible trajectories in the target frame. To ensure safety, high-order control barrier functions (CBFs) are constructed online from a compact set of high-risk collision points extracted from depth, enabling real-time obstacle avoidance without maps or dense representations. We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss and severe lighting changes that disrupt feature-based localization. Each experiment is repeated multiple times under similar conditions to assess resilience, showing consistent and reliable performance. NOVA achieves agile target following at speeds exceeding 50 km/h. These results show that high-speed, vision-based tracking is possible in the wild using only onboard sensing, with no reliance on external localization or assumptions on the environment structure.

Index terms

Aerial Systems: Perception and Autonomy Visual Tracking Vision-Based Navigation

Related papers