Research Analyzer
← Back ICRA 2026

AeroScene: Progressive Scene Synthesis for Aerial Robotics

Nghia Vu Huu, Tuong Do, Viet-Dzung Tran, Binh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Hai-Nguyen (Hann) Nguyen, Anh Nguyen

PDF

AI summary

Key figure (auto-extracted from paper)
A hierarchical diffusion model generates physically plausible, semantically consistent 3D scenes for drone navigation, outperforming prior methods and enabling scalable simulation.
Hierarchical diffusion 3D scene synthesis drone simulation aerial robotics generative AI physics-ready environments

Problem

Drone simulators currently rely on static, handcrafted environments that lack the scalability, diversity, and hierarchical layout realism required for complex aerial tasks like navigation and landing.

Approach

The method uses a hierarchical diffusion framework with cross-scale progressive attention and task-aware guidance to coherently generate coarse structural layouts and fine-scale object placements.

Key results

  • Outperforms baselines on FID, KID, collision rate, and semantic plausibility metrics
  • Generates a new dataset of over 1,000 physics-ready 3D scenes
  • Seamlessly integrates synthesized environments into NVIDIA Isaac Sim
  • Ablation studies confirm guidance objectives are critical for physical and hierarchical consistency

Why it matters

Provides a scalable, realistic simulation pipeline that accelerates the development and benchmarking of autonomous drone navigation and aerial robotics systems.

Abstract

Generative models have shown substantial impact across multiple domains, their potential for scene synthesis remains underexplored in robotics. This gap is more evident in drone simulators, where simulation environments still rely heavily on manual efforts, which are time-consuming to create and difficult to scale. In this work, we introduce AeroScene, a hierarchical diffusion model for progressive 3D scene syn- thesis. Our approach leverages hierarchy-aware tokenization and multi-branch feature extraction to reason across both global layouts and local details, ensuring physical plausibility and semantic consistency. This makes AeroScene particularly suited for generating realistic scenes for aerial robotics tasks such as navigation, landing, and perching. We demonstrate its effectiveness through extensive experiments on our newly col- lected dataset and a public benchmark, showing that AeroScene significantly outperforms prior methods. Furthermore, we use AeroScene to generate a large-scale dataset of over 1,000 physics-ready, high fidelity 3D scenes that can be directly integrated into NVIDIA Isaac Sim. Finally, we illustrate the utility of these generated environments on downstream drone navigation tasks. Our code and dataset are publicly available at aioz-ai.github.io/AeroScene/

Index terms

Deep Learning Methods Aerial Systems: Applications Semantic Scene Understanding

Related papers