← Back ICRA 2026

Distortion-Aware PETR for BEV Object Detection with Mixed Pinhole�Fisheye Cameras

Xiangzhong Liu

PDF

AI summary

Key figure (auto-extracted from paper)

DAPETR outperforms explicit geometric methods for fisheye BEV detection by using learned distortion-aware embeddings and bidirectional feature-geometry co-modulation, while revealing that learned adaptation and explicit geometric re-parameterization can negatively interact.

BEV detection fisheye cameras distortion-aware projection-free 3D object detection feature modulation

Problem

Severe radial distortion in fisheye cameras violates the uniform sampling assumption of most BEV detectors, leaving their 3D detection potential under-exploited. Existing projection-free methods struggle with geometric breakdown and lack effective adaptation for mixed pinhole–fisheye sensor configurations.

Approach

DAPETR adapts the projection-free PETR detector by integrating unified distortion-aware positional embeddings and a bidirectional feature-geometry co-modulation module that mutually refines image features and 3D spatial queries using learned distortion maps.

Key results

Unified distortion-aware positional embeddings harmonize 2D and 3D features with fisheye geometry
Bidirectional feature-geometry co-modulation enhances appearance-geometry alignment
Achieves superior mAP and NDS over baseline PETR and PolarPETR on KITTI-360
Reveals a negative interaction between learned adaptation and explicit geometric re-parameterization

Why it matters

It provides a computationally efficient, projection-free alternative to image rectification for fisheye 3D perception, offering critical design insights for robust autonomous driving perception systems.

Abstract

Fisheye cameras are widely deployed in au- tonomous driving perception suites for their low cost and full-coverage field of view (FOV), yet their potential remains under-leveraged in 3D object detection. Severe radial distortion challenges most BEV detectors by violating the fundamental assumption of uniform sampling. To bridge this gap, we propose Distortion-Aware PETR (DAPETR), a projection-free detector tailored for mixed pinhole–fisheye camera setups. DAPETR incorporates two key learned-adaptive modules: a unified distortion-aware positional embedding that harmonizes positional encodings for image representations with fisheye geometry, and a bidirectional feature-geometry co-modulation module that mutually adapts image features and 3D positional embeddings. In our experiments on a converted KITTI-360 benchmark, we systematically compare our learned-adaptive approach against PETR in polar coordinates (PolarPETR). We find that while both methods improve over the baseline, our learned modules achieve superior performance. Crucially, we uncover a negative interaction when combining both strategies, revealing that learned adaptation and explicit geometric re- parameterization can conflict. Our final DAPETR model signif- icantly advances the research and benchmark for fisheye BEV detection, providing critical insights into effective distortion- aware 3D perception design other than image rectification.

Index terms

Visual Learning Omnidirectional Vision Computer Vision for Transportation