← Back ICRA 2026

MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

Yirum Kim, Jaewoo Kim, Ue-Hwan Kim

PDF

AI summary

Key figure (auto-extracted from paper)

MA3DSG enables scalable, training-free multi-agent 3D scene graph generation, cutting runtime by 4× and data traffic by 98× compared to existing baselines.

Multi-Agent Systems 3D Scene Graph Generation Scalability Graph Alignment Indoor Navigation MA3DSG-Bench

Problem

Current 3D scene graph generation methods rely on single-agent paradigms and small-scale benchmarks, failing to scale to real-world, large-scale indoor environments. The paper addresses the critical scalability gap and lack of multi-agent evaluation frameworks in this domain.

Approach

The authors propose a decentralized multi-agent framework that incrementally builds local scene graphs and merges them using a novel, training-free graph alignment algorithm. They also introduce MA3DSG-Bench, a unified benchmark supporting diverse agent configurations, large scales, and dynamic environments.

Key results

First scalable multi-agent 3DSGG framework requiring no learnable parameters
Training-free graph alignment algorithm efficiently merges partial query graphs
MA3DSG-Bench benchmark enabling evaluation across diverse scales and dynamic conditions
4× faster runtime and 98× reduced data traffic compared to single-agent and multi-agent baselines

Why it matters

Provides a foundational framework and benchmark for scalable, collaborative 3D scene understanding in real-world robotics and multi-agent navigation applications.

Abstract

Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real- world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging exten- sive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench—a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions—providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.

Index terms

Semantic Scene Understanding Multi-Robot Systems