MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments
Yirum Kim, Jaewoo Kim, Ue-Hwan Kim
AI summary
Problem
Current 3D scene graph generation methods rely on single-agent paradigms and small-scale benchmarks, failing to scale to real-world, large-scale indoor environments. The paper addresses the critical scalability gap and lack of multi-agent evaluation frameworks in this domain.
Approach
The authors propose a decentralized multi-agent framework that incrementally builds local scene graphs and merges them using a novel, training-free graph alignment algorithm. They also introduce MA3DSG-Bench, a unified benchmark supporting diverse agent configurations, large scales, and dynamic environments.
Key results
- First scalable multi-agent 3DSGG framework requiring no learnable parameters
- Training-free graph alignment algorithm efficiently merges partial query graphs
- MA3DSG-Bench benchmark enabling evaluation across diverse scales and dynamic conditions
- 4× faster runtime and 98× reduced data traffic compared to single-agent and multi-agent baselines
Why it matters
Provides a foundational framework and benchmark for scalable, collaborative 3D scene understanding in real-world robotics and multi-agent navigation applications.
Abstract
Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real- world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging exten- sive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench—a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions—providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.