Semantic-Guided Progressive Object Removal with Gaussian Splatting
Mingkai Liu , Dikai Fan , Xiao Liu, Hao Zhang
AI summary
Problem
Existing 3D object removal methods struggle with complex textures and geometric coherence due to one-shot completion strategies that neglect cross-view semantic cues.
Approach
The method leverages DINOv2 to match semantic blocks across multiple views and selectively re-inpaints low-quality regions using high-frequency guidance, all optimized within a 3D Gaussian Splatting pipeline.
Key results
- Outperforms NeRF- and Gaussian-based baselines in visual quality and geometric consistency
- Recovers fine-grained textures and structural details in complex occlusions
- Ensures cross-view coherence through semantic-guided block matching
- Enables efficient, high-fidelity 3D scene editing and novel view synthesis
Why it matters
Provides a robust, efficient solution for realistic 3D scene manipulation, benefiting AR/VR, robotics, and digital content creation workflows.
Abstract
Removing unwanted objects from reconstructed 3D scenes is an important task in computer vision, supporting applications in AR/VR, robotics, and digital content creation. Existing methods typically complete the entire masked region in a single step and without effectively utilizing semantic information from other views, leading to difficulties in handling complex geometric details and textures. In this work, we propose a novel framework that integrates Semantic-guided Block Matching (SBM) and Region-Wise Progressive Refine- ment (RPR) for high-quality 3D object removal. First, we leverage DINOv2 to encode semantic guidance from multi- view observations, and the best match tokens are decoded to complete missing regions in the target view while maintaining cross-view consistency. Second, we introduce a RPR strategy that segments the target mask into multiple subregions and selectively refines those with poor visual quality. Our method is built upon Gaussian Splatting, ensuring high-fidelity scene reconstruction with efficient computation. Experimental results demonstrate that our approach outperforms existing Gaussian- based methods in terms of perceptual quality and coherence in 3D object removal.