Distortion-Aware PETR for BEV Object Detection with Mixed Pinhole�Fisheye Cameras
Xiangzhong Liu
AI summary
Problem
Severe radial distortion in fisheye cameras violates the uniform sampling assumption of most BEV detectors, leaving their 3D detection potential under-exploited. Existing projection-free methods struggle with geometric breakdown and lack effective adaptation for mixed pinhole–fisheye sensor configurations.
Approach
DAPETR adapts the projection-free PETR detector by integrating unified distortion-aware positional embeddings and a bidirectional feature-geometry co-modulation module that mutually refines image features and 3D spatial queries using learned distortion maps.
Key results
- Unified distortion-aware positional embeddings harmonize 2D and 3D features with fisheye geometry
- Bidirectional feature-geometry co-modulation enhances appearance-geometry alignment
- Achieves superior mAP and NDS over baseline PETR and PolarPETR on KITTI-360
- Reveals a negative interaction between learned adaptation and explicit geometric re-parameterization
Why it matters
It provides a computationally efficient, projection-free alternative to image rectification for fisheye 3D perception, offering critical design insights for robust autonomous driving perception systems.
Abstract
Fisheye cameras are widely deployed in au- tonomous driving perception suites for their low cost and full-coverage field of view (FOV), yet their potential remains under-leveraged in 3D object detection. Severe radial distortion challenges most BEV detectors by violating the fundamental assumption of uniform sampling. To bridge this gap, we propose Distortion-Aware PETR (DAPETR), a projection-free detector tailored for mixed pinhole–fisheye camera setups. DAPETR incorporates two key learned-adaptive modules: a unified distortion-aware positional embedding that harmonizes positional encodings for image representations with fisheye geometry, and a bidirectional feature-geometry co-modulation module that mutually adapts image features and 3D positional embeddings. In our experiments on a converted KITTI-360 benchmark, we systematically compare our learned-adaptive approach against PETR in polar coordinates (PolarPETR). We find that while both methods improve over the baseline, our learned modules achieve superior performance. Crucially, we uncover a negative interaction when combining both strategies, revealing that learned adaptation and explicit geometric re- parameterization can conflict. Our final DAPETR model signif- icantly advances the research and benchmark for fisheye BEV detection, providing critical insights into effective distortion- aware 3D perception design other than image rectification.