Research Analyzer
← Back ICRA 2026

One Prompt, Many Rooms: A Force-Directed Approach to 3D Scene Generation for Robotics Simulation

Christopher May, Peter Hetzner, Matthias Kalenberg, Ashwin Sajith Nambiar, Jörg Franke, Sebastian Reitelshöfer

PDF

AI summary

Key figure (auto-extracted from paper)
A single semantic plan can be efficiently diversified into multiple physically valid 3D scenes using force-directed simulation, enabling robust robotic navigation training.
3D scene generation force-directed simulation robotic training data LLM planning embodied AI layout synthesis

Problem

Current text-to-scene methods are inefficient for robotics because generating diverse layouts requires repeated, expensive sampling that breaks semantic consistency, hindering the creation of large-scale, physically consistent training datasets for generalizable robotic agents.

Approach

An LLM generates a single declarative scene plan, which a force-directed 2D physics simulation translates into multiple distinct, physically plausible layouts that share identical semantics but vary in geometry.

Key results

  • A one-plan-to-many-layouts pipeline generating diverse 3D scenes from a single prompt
  • A force-directed placement technique translating semantic rules into forces and torques via 2D physics simulation
  • A scalable validation loop combining deterministic physical checks with VLM-based semantic assessment
  • A navigation agent trained on generated scenes achieving a 0.84 success rate on Replica environments

Why it matters

It provides a scalable, efficient method for generating controlled, diverse simulation data, addressing a critical bottleneck for training generalizable embodied AI and robotic agents.

Abstract

Training generalizable robotic agents requires large datasets of diverse, physically consistent 3D scenes, yet their generation remains a critical bottleneck. Current text-to- scene methods are inefficient for this task; generating diverse layouts requires repeated, expensive sampling that fails to maintain the semantic consistency required for robust policy learning. In this paper, we address this with our “one-plan-to- many-layouts” method. A Large Language Model (LLM) generates a single declarative plan, which a force-directed physics simulation then realizes into multiple layouts that share semantics but differ in geometry. We validate our method by transfer to photorealistic 3D reconstructions of real environments (Replica) within simulation, where a navigation agent trained on our scenes attains a Success Rate of 0.84. These results establish our pipeline as a scalable method for producing the controlled, diverse data required for embodied AI training.

Index terms

Simulation and Animation Data Sets for Robot Learning Software Tools for Benchmarking and Reproducibility

Related papers