← Back ICRA 2026

ActMVS: Active Scene Reconstruction with Monocular Multi-View Stereo

Guo Pu, Yixuan Han, Zhouhui Lian

PDF

AI summary

Key figure (auto-extracted from paper)

ActMVS enables real-time, metric-accurate 3D reconstruction and safe navigation for resource-limited robots using only a single camera, matching the performance of RGB-D systems.

Monocular reconstruction Active SLAM Multi-view stereo Depth optimization UAV navigation Spatial intelligence

Problem

Existing active reconstruction methods rely on costly depth sensors, while current monocular approaches lack the real-time, globally consistent dense depth required for safe robot and UAV navigation.

Approach

The framework uses a view factor graph with voxel-frame visibility modeling to guide multi-view stereo depth prediction, combined with a global depth optimization algorithm to enforce cross-view consistency for online metric depth estimation.

Key results

First monocular active reconstruction framework
View factor graph with voxel-frame visibility modeling
Global depth optimization for cross-view consistency
Competitive rendering and mesh accuracy versus RGB-D baselines

Why it matters

Provides a lightweight, vision-only solution for safe autonomous navigation and reconstruction in resource-constrained robots and UAVs.

Abstract

Active scene reconstruction enables robots/UAVs to autonomously plan trajectories and reconstruct environments without costly manual data acquisition. Unlike passive methods, active reconstruction requires real-time construction of high- confidence occupancy maps for collision-free navigation. Ex- isting approaches rely on depth sensors for occupancy map updates, increasing platform cost and weight. To advance spatial intelligence, we aim for a vision-only monocular solution. However, current monocular scene reconstruction methods operate offline and fail to deliver globally consistent dense depth at the frame rates required for robots/UAVs navigation. To bridge this gap, we introduce ActMVS, the first framework for monocular active reconstruction. Our framework integrates a view factor graph construction for informed Multi-View Stereo depth prediction, along with a global depth optimization, to enable the online generation of high-quality, globally consistent dense depth maps. This enables monocular robots/UAVs to maintain reliable occupancy maps for safe trajectory plan- ning during reconstruction. Experiments on Replica datasets demonstrate performance competitive with RGB-D methods. Our code and data are available at https://github.com/ TrickyGo/ActMVS.

Index terms

View Planning for SLAM Mapping Vision-Based Navigation