JetsonCompletion: Real-Time Depth Completion on Resource-Constrained Edge Devices
Kailin Wang, Xiaozhou Zhu, Benyi Yang, Tian Zhang, haoxin zhang, Fei Xie, Shuaixin Li
AI summary
Problem
Current depth completion models are too large and power-hungry for edge deployment, while sparse LiDAR supervision causes blurred boundaries and poor accuracy in distant regions.
Approach
The method uses a structurally reparameterized encoder-decoder with fewer than 5 million parameters, trained via a progressive two-stage distillation from both a dense completion teacher and a monocular geometric teacher, then optimized with TensorRT for embedded systems.
Key results
- Sub-5M parameter model achieves competitive accuracy on KITTI-DC and NYU-v2 benchmarks
- Runs at over 30 FPS with under 33 ms latency on a Jetson Xavier NX within a 20 W power envelope
- Two-stage hybrid distillation reduces RMSE by 110 mm compared to sparse-only supervision while preserving sharp object boundaries
- End-to-end PyTorch-to-TensorRT FP16 pipeline ensures stable, drop-free real-time inference on resource-constrained hardware
Why it matters
Provides the first sustained real-time depth completion solution for micro-robots and autonomous platforms operating under strict 20 W power and compute constraints.
Abstract
Depth completion from sparse LiDAR points and images is a key perception task for autonomous robots, en- abling dense 3D understanding in challenging environments. However, most recent researches achieve accuracy gains by greatly enlarging network size, making them unsuitable for real- time deployment on power- and compute-constrained platforms. This paper proposes an ultra-lightweight depth completion framework optimized for embedded systems. Our approach integrates a re-parameterized encoder–decoder with fewer than 5 M parameters and a two-stage hybrid distillation strategy. The first stage progressively densifies sparse depth supervision, while the second preserves edge fidelity through a combination of met- ric and structural losses. A full TensorRT FP16 pipeline further ensures efficient deployment. Extensive experiments on KITTI Depth Completion, NYU-v2 . demonstrate that our method achieves competitive accuracy while maintaining high efficiency. On a Jetson Xavier NX, the system runs at over 30 FPS with sub-33 ms latency within a 20 W power envelope, showing strong potential for real-world micro-robotic platforms. We will open-source the code to benefit the community. Our open source website: https://github.com/2463450186Q/JetsonCompletion.git