Depth Completion by Rescaling Monocular Depth Estimates Via Compressed Sensing
Daoxin Zhong, Jun Li, Yeshas Thadimari, Meng Yee (Michael) Chuah
AI summary
Problem
Existing learning-based depth completion models require additional parameters, domain-specific fine-tuning, and high GPU VRAM, making them impractical for resource-constrained robotic platforms.
Approach
The method computes a per-pixel scaling ratio matrix from sparse depth measurements using compressed sensing and a 2D DCT basis, then applies it to rescale any pre-trained monocular depth estimate without retraining.
Key results
- Reduces RMSE and MAE by over 15x compared to raw monocular estimates
- Outperforms state-of-the-art depth completion models at sampling ratios above 50%
- Eliminates the need for model fine-tuning or additional training data
- Lowers runtime GPU VRAM requirements compared to end-to-end neural networks
Why it matters
Enables accurate, real-time depth completion on power- and memory-limited robotic systems without the overhead of training complex neural networks.
Abstract
Depth completion is the challenge of recovering a dense depth map from an RGB image and corresponding sparse depth measurements. Many modern depth completion strategies often rely on deep neural networks, using a monocular depth estimation (MDE) backbone to generate an initial dense depth map from the RGB image. This estimate is then further refined with the help of auxiliary network components that utilise the sparse depth measurements to improve accuracy and restore fine-grained depth details. However, such approaches intro- duce additional model parameters and require domain-specific fine-tuning, making them impractical for resource-constrained robotics applications. In this paper, we propose an alternative refinement strategy based on compressed sensing. Using the Discrete Cosine Transform (DCT) as our basis, we construct a ratio matrix that rescales the estimated depth map to align with measured ground truth data. Our experiments demonstrate that this method can significantly reduce the RMSE and MAE of the initial MDE estimate by more than a factor of 15. Furthermore, the proposed approach can outperform state-of-the-art depth completion models at sampling ratios above 50 percent, while also reducing the overall GPU VRAM requirements. This pipeline is modular and compatible with any existing MDE model with no additional training, making it particularly suitable for deployment on GPU-constrained robotic platforms in previously unseen environments.