← Back ICRA 2024

DualAT: Dual Attention Transformer for End-To-End Autonomous Driving

Zesong Chen, Ze Yu, Jun Li, Linlin You, Xiaojun Tan

PDF

Abstract

The effective reasoning of integrated multimodal perception information is crucial for achieving enhanced end- to-end autonomous driving performance. In this paper, we introduce a novel multitask imitation learning framework for end-to-end autonomous driving that leverages a dual atten- tion transformer (DualAT) to enhance the multimodal fusion and waypoint prediction processes. A self-attention mechanism captures global context information and models the long-term temporal dependencies of waypoints for multiple time steps. On the other hand, a cross-attention mechanism implicitly as- sociates the latent feature representations derived from different modalities through a learnable geometrically linked positional embedding. Specifically, the DualAT excels at processing and fusing information from multiple camera views and LiDAR sensors, enabling comprehensive scene understanding for mul- titask learning. Furthermore, the DualAT introduces a novel waypoint prediction architecture that combines the tempo- ral relationships between waypoints with the spatial features extracted from sensor inputs. We evaluate our approach on both the Town05 and Longest6 benchmarks using the closed- loop CARLA urban driving simulator and provide extensive ablation studies. The experimental results demonstrate that our approach significantly outperforms the state-of-the-art methods.

Index terms

Autonomous Vehicle Navigation Sensor Fusion Intelligent Transportation Systems