DreamSea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models
Tianyi Zhang, Weiming Zhi, Joshua Mangelson, Matthew Johnson-Roberson
AI summary
Problem
Off-the-shelf generative models produce low-quality underwater scenes due to scarce training data, while underwater robots collect massive 2D imagery that lacks reliable 3D depth and camera pose information due to challenging visibility and sensor limitations.
Approach
The authors train a diffusion model on unannotated underwater RGB images, using foundation models to extract depth and semantic features. They control terrain diversity and spatial consistency by conditioning the model on latent embeddings generated via a fractal Diamond-Square process, then fuse the outputs into a 3D Gaussian Splatting map refined with 2D diffusion priors.
Key results
- Novel fractal-based latent embedding framework for controlling terrain appearance and spatial consistency
- Integration of visual foundation models to extract 3D geometry and semantics from unannotated RGB data
- Unified pipeline generating photorealistic RGBD maps and 3D Gaussian Splatting models supervised by 2D diffusion priors
- Superior FID scores and qualitative photorealism compared to off-the-shelf text-to-image models on unseen coral datasets
Why it matters
Provides a scalable solution for generating high-fidelity underwater simulations, directly advancing autonomous underwater vehicle training and marine robotics research.
Abstract
This paper tackles the problem of generating rep- resentations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we intro- duce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas. We extract 3D geometry and latent embeddings from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal-distribution- based latent embeddings. We then fuse the generated images into a 3D map, building a 3D Gaussian Splatting (3DGS) model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demon- strating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in underwater robotics, and in particular, underwater robot simulation.