← Back ICRA 2026

GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion

Santiago Montiel-MarÃn, Miguel Antunes-GarcÃa, Fabio SÃ¡nchez-GarcÃa, Angel Llamazares, Holger Caesar, Luis M. Bergasa

PDF

AI summary

Key figure (auto-extracted from paper)

GaussianCaR achieves state-of-the-art or competitive BEV segmentation accuracy while running 3.2× faster than existing methods by using Gaussian Splatting to efficiently fuse camera and radar data.

Gaussian Splatting Camera-Radar Fusion BEV Segmentation Autonomous Driving Sensor Fusion 3D Perception

Problem

Existing camera-radar fusion methods for Bird's-Eye View perception struggle with view disparity, sparse radar representations, and high computational costs, hindering robust and cost-effective deployment in autonomous vehicles.

Approach

The method repurposes 3D Gaussian Splatting as a universal view transformer, mapping camera pixels and radar points into a unified BEV latent space through differentiable rasterization and multi-scale feature fusion.

Key results

57.3% IoU for vehicle segmentation on nuScenes, matching or surpassing SOTA camera-radar baselines
82.9% and 50.1% IoU for drivable area and lane divider segmentation
3.2× faster inference runtime compared to prior fusion methods
Novel Pixels-to-Gaussians and Points-to-Gaussians encoders that preserve measurement uncertainty and enable dense feature propagation

Why it matters

It enables robust, cost-effective, and real-time perception for autonomous vehicles by efficiently bridging camera and radar data without sacrificing accuracy.

Abstract

Robust and accurate perception of dynamic objects and map elements is crucial for autonomous vehicles performing safe navigation in complex traffic scenarios. While vision-only methods have become the de facto standard due to their technical advances, they can benefit from effective and cost-efficient fusion with radar measurements. In this work, we advance fusion methods by repurposing Gaussian Splatting as an efficient universal view transformer that bridges the view disparity gap, mapping both image pixels and radar points into a common Bird’s-Eye View (BEV) representation. Our main contribution is GaussianCaR, an end-to-end network for BEV segmentation that, unlike prior BEV fusion methods, leverages Gaussian Splatting to map raw sensor information into latent features for efficient camera-radar fusion. Our architecture combines multi-scale fusion with a transformer decoder to efficiently extract BEV features. Experimental results demonstrate that our approach achieves performance on par with, or even surpassing, the state- of-the-art on BEV segmentation tasks (57.3%, 82.9%, 50.1% IoU for vehicles, roads, and lane dividers) on the nuScenes dataset, while maintaining a 3.2× faster inference runtime. Code and project page are available online.

Index terms

Deep Learning for Visual Perception Sensor Fusion Intelligent Transportation Systems