OTVIC: A Dataset with Online Transmission for Vehicle-To-Infrastructure Cooperative 3D Object Detection
He Zhu, Yunkai Wang, Quyu Kong, Yufei Wei, Xunlong Xia, Bing Deng, Rong Xiong, Yue Wang
Abstract
Vehicle-to-infrastructure cooperative 3D object detection (VIC3D) is a task that leverages both vehicle and road- side sensors to jointly perceive the surrounding environment. However, considering the high speed of vehicles, the real-time requirements, and the limitations of communication bandwidth, roadside devices transmit the results of perception rather than raw sensor data or feature maps in our real-world scenarios. And affected by various environmental factors, the transmission delay is dynamic. To meet the needs of practical applications, we present OTVIC, which is the first multi-modality and multi- view dataset with online transmission from real scenes for vehicle-to-infrastructure cooperative 3D object detection. The ego-vehicle receives the results of infrastructure perception in real-time, collected from a section of highway in Chengdu, China. Moreover, we propose LfFormer, which is a novel end- to-end multi-modality late fusion framework with transformer for VIC3D task as a baseline based on OTVIC. Experiments prove our fusion frameworkâs effectiveness and robustness. Our project is available at https://sites.google.com/view/ otvic.