← Back ICRA 2026

Semantic-Guided Progressive Object Removal with Gaussian Splatting

Mingkai Liu , Dikai Fan , Xiao Liu, Hao Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

A semantic-guided, region-wise progressive refinement framework achieves high-fidelity, cross-view consistent 3D object removal within Gaussian Splatting.

3D object removal Gaussian Splatting semantic guidance progressive refinement multi-view inpainting diffusion models

Problem

Existing 3D object removal methods struggle with complex textures and geometric coherence due to one-shot completion strategies that neglect cross-view semantic cues.

Approach

The method leverages DINOv2 to match semantic blocks across multiple views and selectively re-inpaints low-quality regions using high-frequency guidance, all optimized within a 3D Gaussian Splatting pipeline.

Key results

Outperforms NeRF- and Gaussian-based baselines in visual quality and geometric consistency
Recovers fine-grained textures and structural details in complex occlusions
Ensures cross-view coherence through semantic-guided block matching
Enables efficient, high-fidelity 3D scene editing and novel view synthesis

Why it matters

Provides a robust, efficient solution for realistic 3D scene manipulation, benefiting AR/VR, robotics, and digital content creation workflows.

Abstract

Removing unwanted objects from reconstructed 3D scenes is an important task in computer vision, supporting applications in AR/VR, robotics, and digital content creation. Existing methods typically complete the entire masked region in a single step and without effectively utilizing semantic information from other views, leading to difficulties in handling complex geometric details and textures. In this work, we propose a novel framework that integrates Semantic-guided Block Matching (SBM) and Region-Wise Progressive Refine- ment (RPR) for high-quality 3D object removal. First, we leverage DINOv2 to encode semantic guidance from multi- view observations, and the best match tokens are decoded to complete missing regions in the target view while maintaining cross-view consistency. Second, we introduce a RPR strategy that segments the target mask into multiple subregions and selectively refines those with poor visual quality. Our method is built upon Gaussian Splatting, ensuring high-fidelity scene reconstruction with efficient computation. Experimental results demonstrate that our approach outperforms existing Gaussian- based methods in terms of perceptual quality and coherence in 3D object removal.

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Semantic Scene Understanding