Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection
Taehun Kong, Tae-Kyun Kim
AI summary
Problem
Existing semi-supervised 3D object detection methods rely on fixed or manually tuned thresholds that ignore contextual factors like object distance, class, and training state, resulting in suboptimal pseudo-label quality and coverage.
Approach
The authors introduce a learnable Pseudo-label Selection Module that fuses multiple teacher network scores into a unified quality metric and dynamically predicts context-aware thresholds, paired with a Soft Supervision strategy to mitigate pseudo-label noise.
Key results
- Introduces a novel learning-based Pseudo-label Selection Module (PSM) for adaptive label filtering
- Proposes a noise-robust Soft Supervision strategy using joint confidence scoring and loss re-weighting
- Achieves ~20 mAP absolute improvement over labeled-only baselines on the KITTI 1% split
- Demonstrates wider pseudo-label coverage and higher recall while maintaining high precision on KITTI and Waymo datasets
Why it matters
Drastically reduces the need for costly 3D annotations, enabling more scalable and accurate scene understanding for autonomous driving and robotics.
Abstract
Semi-supervised 3D object detection (SS3DOD) aims to reduce costly 3D annotations utilizing unlabeled data. Recent studies adopt pseudo-label-based teacher-student frame- works and demonstrate impressive performance. The main challenge of these frameworks is in selecting high-quality pseudo-labels from the teacher’s predictions. Most previous methods, however, select pseudo-labels by comparing confidence scores over thresholds manually set. The latest works tackle the challenge either by dynamic thresholding or refining the quality of pseudo-labels. Such methods still overlook contextual information e.g., object distances, classes, and learning states, and inadequately assess the pseudo-label quality using partial information available from the networks. In this work, we pro- pose a novel SS3DOD framework featuring a learnable pseudo- labeling module designed to automatically and adaptively se- lect high-quality pseudo-labels. Our approach introduces two networks at the teacher output level. These networks reliably assess the quality of pseudo-labels by the score fusion and determine context-adaptive thresholds, which are supervised by the alignment of pseudo-labels over GT bounding boxes. Additionally, we introduce a soft supervision strategy that can learn robustly under pseudo-label noise. This helps the student network prioritize cleaner labels over noisy ones in semi- supervised learning. Extensive experiments on the KITTI and Waymo datasets demonstrate the effectiveness of our method. The proposed method selects high-precision pseudo-labels while maintaining a wider coverage of contexts and a higher recall rate, significantly improving relevant SS3DOD methods.