Active Scene Reconstruction with Topological Reasoning and Semantic-Augmented Reinforcement Learning
Yiqing Yuan, Zhi Li, Hao Ren, Kairao Zheng, Hui Cheng
AI summary
Problem
Existing Gaussian splatting-based active reconstruction methods rely on heuristic planning, 2D topological abstractions, or online reinforcement learning, which fail to scale in cluttered environments, lack full 3D connectivity, and incur prohibitive computational costs.
Approach
The method extracts a 3D skeleton graph to capture traversable spatial connectivity, fuses Bird’s-Eye-View semantic features into node embeddings via cross-attention, and optimizes viewpoint selection using offline reinforcement learning with distributional shift regularization.
Key results
- Compact 3D skeleton graph captures full spatial connectivity and traversable regions
- BEV-augmented graph inference enriches node embeddings with semantic context without handcrafted features
- Offline RL with MMD regularization stabilizes training and replaces heuristic planning with data-driven decisions
- Extensive simulations show consistent improvements over baselines with successful zero-shot real-world transfer
Why it matters
Enables mobile robots to autonomously generate photorealistic, fine-grained 3D maps at scale, advancing applications in AR/VR, digital twins, and autonomous navigation.
Abstract
Active scene reconstruction aims to autonomously recover the fine-grained appearance and structural details of a complex unknown scenes. Existing approaches based on 2D topological or voxel-based abstractions often scale poorly to large environments and rely heavily on handcrafted features and heuristic rules, limiting scalability and robustness. To address these challenges, using a RGB-D camera on a mobile robot, we present a graph-based planning framework by integrating skeleton-derived topology, Bird’s-Eye-View (BEV)-augmented graph inference, and offline Reinforcement Learning (RL) for policy optimization. The 3D skeleton graph captures full spatial connectivity, overcoming the limitations of 2D representations. BEV-augmented graph inference enriches node embeddings with semantic context, avoiding handcrafted feature design. The offline RL approach replaces heuristic planning with data-driven decision-making, while an additional Maximum Mean Discrep- ancy (MMD) term mitigates distributional shift before and after feature injection, improving stability. Extensive simulation results validate the efficacy of the proposed method. Real-world experiments demonstrate the zero-shot transferability of the learned policy, highlighting its potential for scalable, fine-grained scene reconstruction.