Research Analyzer
← Back ICRA 2026

Reliable and Fast Humans Removed Visual Scene Representation

Serhat Iscan, H. Isil Bozma

PDF

AI summary

Key figure (auto-extracted from paper)
A novel scene descriptor removes human occlusions via spherical interpolation instead of inpainting, cutting computation time by up to 44× while maintaining representation quality.
Scene descriptors Human occlusion removal Bubble descriptors Spatial reasoning Visual obstruction Place recognition

Problem

Most visual scene representations degrade when humans obstruct the camera view, and existing removal methods rely on computationally expensive inpainting or reconstruction without accounting for how much the scene is actually blocked.

Approach

The method first quantifies visual obstruction to filter unreliable frames, then directly constructs a scene descriptor by deforming obstructed regions using modified spherical interpolation, bypassing heavy preprocessing.

Key results

  • Comparable representation quality to state-of-the-art inpainting methods
  • 14–44× reduction in computation time
  • Novel visual obstruction metric effectively filters severely occluded frames
  • Two new human-occupied scene datasets collected for evaluation

Why it matters

Enables real-time, robust spatial reasoning and place recognition for robots operating in dynamic, human-populated environments.

Abstract

This paper introduces a reliable and fast method for scene representation from a single RGB frame, even with human occlusion. Our goal is to enhance vision-based spatial reasoning in dynamic environments where human presence varies over time. Once humans are detected, the method addresses two key challenges: estimating the level of visual obstruction and generating a scene descriptor with humans removed. The first is handled via a novel visual obstruction measure that prevents descriptor generation under high occlusion. The second is addressed by adapting the previously presented bubble descriptor so that surface regions corresponding to detected humans are deformed using a modified spherical interpolation method—eliminating the need for inpainting or reconstruction and enabling rapid computation. We validate our approach through extensive comparisons across multiple datasets, including two new datasets collected using both stationary and mobile robots. Results show comparable representation quality with a 14-44× reduction in computation time.

Index terms

RGB-D Perception Recognition Human-Centered Robotics

Related papers