MASAR: Motion�Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting
Mohammed Amine Bencheikh Lehocine, Julian Schmidt, Frank Moosmann, Dikshant Gupta, Fabian Flohr
AI summary
Problem
Existing end-to-end autonomous driving models fail to fully exploit long-term motion cues and rely on noisy tracking or map information, limiting detection and forecasting accuracy.
Approach
MASAR introduces a tracking-free, map-free framework that jointly predicts multiple past trajectory hypotheses per object and refines them using appearance-guided scoring, then conditions future trajectory forecasting on these refined past trajectories.
Key results
- New state-of-the-art on nuScenes without map data
- Over 20% reduction in minADE and minFDE
- Consistent gains across BEVFormer and SparseBEV backbones
- Up to 6% minFDE improvement and 7% miss rate reduction via past conditioning
Why it matters
Enables more robust and accurate perception-prediction pipelines for camera-based autonomous driving by eliminating reliance on error-prone tracking and high-definition maps.
Abstract
Classical autonomous driving systems connect per- ception and prediction modules via hand-crafted bounding-box interfaces, limiting information flow and propagating errors to downstream tasks. Recent research aims to develop end-to-end models that jointly address perception and prediction; however, they often fail to fully exploit the synergy between appearance and motion cues, relying mainly on short-term visual features. We follow the idea of “looking backward to look forward”, and propose MASAR, a novel fully differentiable framework for joint 3D detection and trajectory forecasting compatible with any transformer-based 3D detector. MASAR employs an object- centric spatio-temporal mechanism that jointly encodes appear- ance and motion features. By predicting past trajectories and refining them using guidance from appearance cues, MASAR captures long-term temporal dependencies that enhance future trajectory forecasting. Experiments conducted on the nuScenes dataset demonstrate MASAR’s effectiveness, showing improve- ments of over 20% in minADE and minFDE while maintaining robust detection performance. Code and models are available at https://github.com/aminmed/MASAR.