STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation
Konyul Park, Daehun Kim, Jiyong Oh, Seunghoon Yu, Junseo Park, Jaehyun Park, Hongjae Shin, Hyungchan Cho, Jungho Kim, Jun Won Choi
AI summary
Problem
Existing off-road datasets lack scalability and multi-modality, relying on costly manual 2D annotations or limited front-view sensors that fail to capture the complex, unstructured geometry required for reliable 3D traversability estimation.
Approach
The authors developed a fully automated, trajectory-guided pipeline that reconstructs dense terrain surfaces from LiDAR scans, extracts geometric mobility cues (slope, elevation, roughness), and propagates traversability labels beyond the robot's path using Mahalanobis distance, paired with a synchronized 360° surround-view sensor suite.
Key results
- First large-scale off-road dataset integrating synchronized surround-view LiDAR, six RGB cameras, and three 4D radars
- Fully automated, annotation-free pipeline for generating scalable 3D traversability ground-truth maps
- Comprehensive benchmark for voxel-level 3D traversability prediction across single- and multi-modal settings
- Diverse dataset spanning farmland, mountainous terrain, lakes, and construction sites under day and night conditions
Why it matters
Provides researchers and developers with scalable, geometry-aware ground truth and robust 360° perception data, accelerating the development of reliable autonomous systems for agriculture, logistics, and search-and-rescue.
Abstract
Reliable off-road navigation requires accurate estimation of traversable regions and robust perception un- der diverse terrain and sensing conditions. However, existing datasets lack both scalability and multi-modality, which limits progress in 3D traversability prediction. In this work, we introduce STONE, a large-scale multi-modal dataset for off- road navigation. STONE provides (1) trajectory-guided 3D traversability maps generated by a fully automated, annotation- free pipeline, and (2) comprehensive surround-view sensing with synchronized 128-channel LiDAR, six RGB cameras, and three 4D imaging radars. The dataset covers a wide ∗These authors contributed equally to this work. †Corresponding author. 1Interdisciplinary Program in Artificial Intelligence, Seoul National Uni- versity, Seoul, 08826, Korea. 2Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea. {kypark, dhkim, jyoh, shyu, jspark, jhpark, hjshin, hccho, jhkim}@adr.snu.ac.kr junwchoi@snu.ac.kr range of environments and conditions, including day and night, grasslands, farmlands, construction sites, and lakes. Our auto-labeling pipeline reconstructs dense terrain surfaces from LiDAR scans, extracts geometric attributes such as slope, elevation, and roughness, and assigns traversability labels beyond the robot’s trajectory using a Mahalanobis-distance- based criterion. This design enables scalable, geometry-aware ground-truth construction without manual annotation. Finally, we establish a benchmark for voxel-level 3D traversability prediction and provide strong baselines under both single- modal and multi-modal settings.