OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Yinan Deng, Yufeng Yue, Jianyu Dou, Jingyu Zhao, Jiahui Wang, Yujie Tang, Yi Yang, Mengyin Fu
AI summary
Problem
Existing robotic mapping methods typically capture only partial scene attributes and suffer from optical blurring, geometric irregularities, semantic ambiguities, or lack real-time performance.
Approach
OmniMap employs a tightly coupled hybrid 3D Gaussian Splatting and voxel representation, integrating a differentiable camera model for motion blur and exposure compensation, normal-constrained geometry updates, and probabilistic fusion for robust open-vocabulary instance understanding.
Key results
- State-of-the-art rendering fidelity, mesh quality, and zero-shot semantic segmentation
- Real-time online mapping at 5.55 fps with a compact model size
- Support for versatile downstream tasks including scene Q&A, interactive editing, and map-assisted navigation
- Novel hybrid 3DGS-Voxel representation ensuring structural stability and fine-grained detail
Why it matters
Provides robotic systems and embodied AI agents with a unified, real-time 3D environmental representation essential for complex perception, manipulation, and navigation tasks.
Abstract
Robotic systems demand accurate and comprehen- sive 3D environment perception, requiring simultaneous capture of photo-realistic appearance (optical), precise layout shape (ge- ometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irreg- ularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real- Manuscript received: 15 May 2025; Accepted 24 August 2025. This article was recommended for publication by Editor Javier Civera upon evaluation of the reviewers’ comments. This work is supported by the National Natural Sci- ence Foundation of China under Grant 92370203, 62473050, 62233002, Bei- jing Natural Science Foundation Undergraduate Research Program QY24180. (Corresponding Author: Yufeng Yue) Yinan Deng, Yufeng Yue, Jianyu Dou, Jingyu Zhao, Jiahui Wang, Yujie Tang, and Yi Yang are with School of Automation, Beijing Institute of Technology, Beijing 100081, China (e-mail: dengyinan@bit.edu.cn; yueyufeng@bit.edu.cn; BruceDou030806@163.com; unique zhao0210@163.com; wjh@bit.edu.cn; 3120235697@bit.edu.cn; yang yi@bit.edu.cn). Mengyin Fu is with the School of Automation, Beijing Institute of Technology, Beijing 100081, China, and the School of Automation, Nanjing University of Science and Technology, Nanjing 210018, China (e-mail: fumy@bit.edu.cn). The project page of OmniMap is available at https://omni-map.github.io/. time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with struc- tural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance- level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the- art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applica- tions, including multi-domain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation.