← Back ICRA 2026

Robust 3D Multi-Object Tracking for Autonomous Driving with Adaptive LiDAR-Visual Fusion and Multilevel Data Association

Chao Jiang, Chao Wang, Liang Nie, mingyue Zhang, Zhou Yuting

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating adaptive LiDAR-camera fusion, IMU/GPS motion compensation, and a novel RGDIoU cost function, the method achieves state-of-the-art tracking accuracy and real-time performance in complex autonomous driving scenarios.

3D multi-object tracking LiDAR-visual fusion motion compensation data association autonomous driving RGDIoU

Problem

Current 3D multi-object tracking systems suffer from alignment errors due to target pose variations, association failures caused by dynamic vehicle motion, and inefficient matching in dense traffic, compromising tracking accuracy and robustness.

Approach

The framework aligns LiDAR and visual detections using a center-plane adaptive projection to correct width distortion, compensates for ego-vehicle motion via IMU and GPS data, and optimizes trajectory matching with a rotational geometric distance IoU cost function and multilevel spatial indexing.

Key results

13% improvement in sAMOTA tracking accuracy
50.24% HOTA score surpassing all compared methods
Real-time processing at 90 FPS on KITTI and nuScenes
Elimination of target width expansion errors via adaptive fusion

Why it matters

Provides autonomous vehicles with a highly accurate, robust, and real-time perception tool essential for safe navigation in complex, dynamic traffic environments.

Abstract

To increase the safety and reliability of au- tonomous driving systems in complex traffic environments, this paper proposes a novel 3D multiobject tracking (MOT) method that integrates center-plane adaptive multisensor fu- sion, motion compensation, and multilevel data association. Unlike traditional methods, our approach employs a center- plane adaptive fusion strategy to align LiDAR and visual data precisely, mitigating errors in the target width caused by pose variations, and improving tracking accuracy. To address vehicle motion-induced association errors in dynamic scenarios, we incorporate IMU and GPS data for high-frequency vehicle pose estimation and compensation, ensuring stable and robust target association. Additionally, a rotational geometric distance intersection-over-union (RGDIoU) cost function is introduced, combined with multilevel spatial indexing, to optimize the data association efficiency and accuracy. The experimental results on benchmark datasets, including KITTI and nuScenes, demonstrate that our method achieves state-of-the-art (SOTA) performance across multiple tracking metrics, including HOTA and sAMOTA, while maintaining real-time performance at 90 FPS. Specifically, our method improves sAMOTA tracking accuracy by 13% over the best existing methods and achieves a HOTA score of 50.24%, surpassing all compared methods.

Index terms

Visual Tracking Object Detection Segmentation and Categorization Human Detection and Tracking