← Back ICRA 2026

Beyond Frame-Wise Tracking: A Trajectory-Based Paradigm for Efficient Point Cloud Tracking

BaiChen Fan, Yuanxi Cui, Jian Li, Qin Wang, Shibo Zhao, Muqing Cao, Sifan Zhou

PDF

AI summary

Key figure (auto-extracted from paper)

TrajTrack achieves state-of-the-art 3D tracking accuracy at 55 FPS by using historical bounding box trajectories to correct short-term motion predictions, bypassing costly multi-frame point cloud processing.

3D single object tracking trajectory prediction point cloud tracking real-time perception motion continuity bounding box history

Problem

Two-frame trackers lack long-term temporal context and fail in sparse or occluded scenes, while sequence-based trackers improve robustness but incur prohibitive computational costs for real-time use.

Approach

The method generates a fast two-frame motion proposal and refines it using an implicit motion modeling module that learns long-term motion continuity exclusively from historical bounding box coordinates.

Key results

State-of-the-art precision on NuScenes (+4.48%)
Real-time inference at 55 FPS
Strong generalizability across base trackers
Lightweight design avoiding multi-frame point cloud processing

Why it matters

Provides a computationally efficient yet robust tracking solution critical for real-time autonomous driving and robotic perception systems.

Abstract

LiDAR-based 3D single object tracking (3D SOT) is a critical task in robotics and autonomous systems. Existing meth- ods typically follow frame-wise motion estimation or a sequence- based paradigm. However, the two-frame methods are efficient but lack long-term temporal context, making them vulnerable in sparse or occluded scenes, while sequence-based methods that process multiple point clouds gain robustness at a significant computational cost. To resolve this dilemma, we propose a novel trajectory-based paradigm and its instantiation, TrajTrack. Traj- Track is a lightweight framework that enhances a base two-frame tracker by implicitly learning motion continuity from historical bounding box trajectories alone—without requiring additional, costly point cloud inputs. It first generates a fast, explicit motion proposal and then uses an implicit motion modeling module to predict the future trajectory, which in turn refines and corrects the initial proposal. Extensive experiments on the large- scale NuScenes benchmark show that TrajTrack achieves new state-of-the-art performance, dramatically improving tracking precision by 3.02% over a strong baseline while running at 55 FPS. Besides, we also demonstrate the strong generalizability of TrajTrack across different base trackers. Code is available at https://github.com/FiBonaCci225/TrajTrack.

Index terms

Visual Tracking Deep Learning for Visual Perception Computer Vision for Transportation