Research Analyzer
← Back ICRA 2026

Active Scene Reconstruction with Topological Reasoning and Semantic-Augmented Reinforcement Learning

Yiqing Yuan, Zhi Li, Hao Ren, Kairao Zheng, Hui Cheng

PDF

AI summary

Key figure (auto-extracted from paper)
Integrating 3D skeleton graphs, semantic BEV augmentation, and offline reinforcement learning enables scalable, high-fidelity active scene reconstruction with zero-shot real-world transfer.
Active scene reconstruction Gaussian splatting Offline reinforcement learning Topological graph Semantic augmentation Zero-shot transfer

Problem

Existing Gaussian splatting-based active reconstruction methods rely on heuristic planning, 2D topological abstractions, or online reinforcement learning, which fail to scale in cluttered environments, lack full 3D connectivity, and incur prohibitive computational costs.

Approach

The method extracts a 3D skeleton graph to capture traversable spatial connectivity, fuses Bird’s-Eye-View semantic features into node embeddings via cross-attention, and optimizes viewpoint selection using offline reinforcement learning with distributional shift regularization.

Key results

  • Compact 3D skeleton graph captures full spatial connectivity and traversable regions
  • BEV-augmented graph inference enriches node embeddings with semantic context without handcrafted features
  • Offline RL with MMD regularization stabilizes training and replaces heuristic planning with data-driven decisions
  • Extensive simulations show consistent improvements over baselines with successful zero-shot real-world transfer

Why it matters

Enables mobile robots to autonomously generate photorealistic, fine-grained 3D maps at scale, advancing applications in AR/VR, digital twins, and autonomous navigation.

Abstract

Active scene reconstruction aims to autonomously recover the fine-grained appearance and structural details of a complex unknown scenes. Existing approaches based on 2D topological or voxel-based abstractions often scale poorly to large environments and rely heavily on handcrafted features and heuristic rules, limiting scalability and robustness. To address these challenges, using a RGB-D camera on a mobile robot, we present a graph-based planning framework by integrating skeleton-derived topology, Bird’s-Eye-View (BEV)-augmented graph inference, and offline Reinforcement Learning (RL) for policy optimization. The 3D skeleton graph captures full spatial connectivity, overcoming the limitations of 2D representations. BEV-augmented graph inference enriches node embeddings with semantic context, avoiding handcrafted feature design. The offline RL approach replaces heuristic planning with data-driven decision-making, while an additional Maximum Mean Discrep- ancy (MMD) term mitigates distributional shift before and after feature injection, improving stability. Extensive simulation results validate the efficacy of the proposed method. Real-world experiments demonstrate the zero-shot transferability of the learned policy, highlighting its potential for scalable, fine-grained scene reconstruction.

Index terms

Mapping RGB-D Perception Deep Learning for Visual Perception

Related papers