← Back ICRA 2024

ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

Tara Sadjadpour, Jie Li, Rares Ambrus, Jeannette Bohg

PDF

Abstract

Multi-object tracking (MOT) is a cornerstone capa- bility of any robotic system. Tracking quality is largely dependent on the quality of input detections. In many applications, such as autonomous driving, it is preferable to over-detect objects to avoid catastrophic outcomes due to missed detections. As a result, current state-of-the-art 3D detectors produce high rates of false-positives to ensure a low number of false-negatives. This can negatively affect tracking by making data association and track lifecycle management more challenging. Additionally, occasional false-negative detections due to difficult scenarios like occlusions can harm tracking performance. To address these issues in a unified framework, we propose ShaSTA which learns shape and spatio-temporal affinities between tracks and detections in consecutive frames. The affinity is a probabilistic matching that leads to robust data association, track lifecycle management, false-positive elimination, false-negative propagation, and sequential track confidence refinement. We offer the first self- contained framework that addresses all aspects of the 3D MOT problem. We quantitatively evaluate ShaSTA on the nuScenes tracking benchmark with 5 metrics, including the most common tracking accuracy metric called AMOTA, to demonstrate how Manuscript received: May 9, 2023; Revised: August 14, 2023; Accepted: September 12, 2023. This paper was recommended for publication by Editor Cesar Cadena Lerma upon evaluation of the Associate Editor and Reviewers’ comments. Toyota Research Institute provided funds to support this work. 1Tara Sadjadpour and Jeannette Bohg are with School of Engineering, Computer Science Department, Stanford University, United States {tsadja, bohg}@stanford.edu 2Jie Li is with NVIDIA, Santa Clara, California, United States jieli@nvidia.com 3Rares Ambrus is with Toyota Research Institute, Los Altos, California, United States rares.ambrus@tri.global Digital Object Identifier (DOI): see top of this page. ShaSTA may impact the ultimate goal of an autonomous mobile agent. ShaSTA achieves 1st place amongst LiDAR-only trackers that use CenterPoint detections. The open-source code for reproducing and extending our work can be found here.

Index terms

Computer Vision for Transportation Deep Learning for Visual Perception Visual Tracking