← Back ICRA 2026

DreamSea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models

Tianyi Zhang, Weiming Zhi, Joshua Mangelson, Matthew Johnson-Roberson

PDF

AI summary

Key figure (auto-extracted from paper)

DreamSea enables photorealistic, spatially consistent 3D underwater scene generation from unannotated 2D robot imagery by conditioning a diffusion model on fractal-distributed latent embeddings.

underwater robotics 3D scene generation diffusion models 3D Gaussian Splatting latent fractal conditioning synthetic data

Problem

Off-the-shelf generative models produce low-quality underwater scenes due to scarce training data, while underwater robots collect massive 2D imagery that lacks reliable 3D depth and camera pose information due to challenging visibility and sensor limitations.

Approach

The authors train a diffusion model on unannotated underwater RGB images, using foundation models to extract depth and semantic features. They control terrain diversity and spatial consistency by conditioning the model on latent embeddings generated via a fractal Diamond-Square process, then fuse the outputs into a 3D Gaussian Splatting map refined with 2D diffusion priors.

Key results

Novel fractal-based latent embedding framework for controlling terrain appearance and spatial consistency
Integration of visual foundation models to extract 3D geometry and semantics from unannotated RGB data
Unified pipeline generating photorealistic RGBD maps and 3D Gaussian Splatting models supervised by 2D diffusion priors
Superior FID scores and qualitative photorealism compared to off-the-shelf text-to-image models on unseen coral datasets

Why it matters

Provides a scalable solution for generating high-fidelity underwater simulations, directly advancing autonomous underwater vehicle training and marine robotics research.

Abstract

This paper tackles the problem of generating rep- resentations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we intro- duce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas. We extract 3D geometry and latent embeddings from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal-distribution- based latent embeddings. We then fuse the generated images into a 3D map, building a 3D Gaussian Splatting (3DGS) model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demon- strating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in underwater robotics, and in particular, underwater robot simulation.

Index terms

Marine Robotics Deep Learning for Visual Perception