← Back IROS 2024

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

Yash Mehan, Kumaraditya Gupta, Rohit Jayanti, Anirudh Govil, Sourav Garg, Madhava Krishna

PDF

Abstract

Robotic tasks such as planning and navigation require a hierarchical semantic understanding of a scene, which could include multiple floors and rooms. Current methods primarily focus on object segmentation for 3D scene under- standing. However, such methods struggle to segment out topological regions like “kitchen” in the scene. In this work, we introduce a two-step pipeline to solve this problem. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self- attention transformer. Our language-topology alignment sup- ports natural language querying, e.g., a “place to cook” locates the “kitchen”. We outperform the current state-of-the-art on room segmentation by ∼20% and room classification by ∼12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding. Project Page: quest-maps.github.io

Index terms

Semantic Scene Understanding Object Detection Segmentation and Categorization Recognition