← Back ICRA 2026

SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving

Muleilan Pei, Jiayao Shan, Peiliang Li, Jieqi Shi, Jing Huo, Yang Gao, Shaojie Shen

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating standard-definition maps through a novel hybrid fusion strategy significantly boosts autonomous driving scene perception and topology reasoning, particularly in occluded and long-range scenarios.

Standard-Definition Maps Autonomous Driving Scene Perception Topology Reasoning BEV Fusion Mapless Driving

Problem

Mapless driving systems struggle with long-range visibility and occlusions due to onboard sensor limits, while existing methods fail to effectively align and fuse standard-definition map priors with visual features.

Approach

SEPT combines rasterized and vectorized standard-definition map features with Bird’s-Eye-View representations using a dual-gated fusion module and adaptive alignment, plus an auxiliary intersection detection task.

Key results

Hybrid rasterized and vectorized SD map encoding with adaptive feature alignment
Dual-gated feature fusion module for seamless BEV feature augmentation
Auxiliary intersection-aware keypoint detection task for enhanced topology reasoning
State-of-the-art performance on OpenLane-V2 with significant gains in perception and topology metrics

Why it matters

Enables scalable, low-cost mapless driving by leveraging widely available navigation maps to overcome sensor limitations in complex urban environments.

Abstract

Online scene perception and topology reasoning are critical for autonomous vehicles to understand their driving environments, particularly for mapless driving systems that endeavor to reduce reliance on costly High-Definition (HD) maps. However, recent advances in online scene understand- ing still face limitations, especially in long-range or occluded scenarios, due to the inherent constraints of onboard sensors. To address this challenge, we propose a Standard-Definition (SD) map Enhanced scene Perception and Topology reasoning (SEPT) framework, which explores how to effectively incorporate the SD map as prior knowledge into existing perception and reasoning pipelines. Specifically, we introduce a novel hybrid feature fusion strategy that combines SD maps with Bird’s-Eye- View (BEV) features, considering both rasterized and vectorized representations, while mitigating potential misalignment between SD maps and BEV feature spaces. Additionally, we leverage the SD map characteristics to design an auxiliary intersection- aware keypoint detection task, which further enhances the overall scene understanding performance. Experimental results on the large-scale OpenLane-V2 dataset demonstrate that by effectively integrating SD map priors, our framework significantly improves both scene perception and topology reasoning, outperforming existing methods by a substantial margin.

Index terms

Computer Vision for Transportation Deep Learning for Visual Perception Intelligent Transportation Systems