Re-MAE: Rethinking Masked Autoencoders towards Geometry-Aware Self-Supervised LiDAR-Based 3D Object Detection
Youngho Cheon, Jae-Keun Lee, Soon Kwon, Jin-Hee Lee, Yongseob Lim
AI summary
Problem
Existing LiDAR masked autoencoders ignore critical geometric properties like distance-dependent sparsity, realistic occlusions, and voxel imbalance, limiting their ability to learn robust representations for occluded or distant objects.
Approach
Re-MAE replaces standard masking with geometry-aware occlusion simulation, uses a multi-scale occupancy reconstruction guided by a context-aware loss, and applies label-free object augmentation to focus learning on foreground structures.
Key results
- +2.83 mAP gain on ONCE dataset
- +1.53 L2 mAP gain on Waymo Open Dataset
- State-of-the-art data-efficient fine-tuning performance
- Enhanced detection of heavily occluded and distant objects
Why it matters
Provides a practical, annotation-free pre-training strategy that significantly advances autonomous driving perception with limited labeled data.
Abstract
Self-supervised pre-training with masked autoen- coders has shown promise for 3D perception, yet most ap- proaches treat LiDAR point clouds in a geometry-agnostic manner. In this paper, we introduce Re-MAE, a geometry- aware self-supervised learning framework for LiDAR-based 3D object detection that explicitly encodes core properties of LiDAR point clouds: occlusion, distance-driven sparsity, and occupied-empty voxel structure. Re-MAE rethinks the geomet- ric characteristics of LiDAR point clouds from the perspectives of “what to learn” and “how to learn”, and introduces three components: (i) Geometry-Aware Masking, which realistically simulates occlusions in LiDAR scans and enables learning complete object representations from partial observations; (ii) Reconstruction-Contextual BCE loss, which effectively guides a multi-scale occupancy prediction task to mitigate distance- dependent point sparsity and the strong occupied-empty voxel imbalance, improving detection of both large vehicles and small, distant pedestrians; and (iii) Realistic Object Augmentation, a label-free foreground augmentation strategy that promotes object-centric representation learning and yields consistent gains across categories. Experiments on ONCE and Waymo Open Dataset validate the effectiveness of Re-MAE, delivering 2.83 mAP and 1.53 L2 mAP respectively over baselines. These results demonstrate that explicitly incorporating the geometric characteristics of LiDAR point clouds enhances the effectiveness of self-supervised learning. The code1 will be released.