Hierarchical Grid-Based Sensor Pose Extraction for Demonstration Dataset Generation
Doyu Lim, Chaewon Park, Soohee Han
AI summary
Problem
Manual expert scanning for complex cultural heritage objects is costly and unscalable, while existing scan datasets lack the per-frame sensor poses needed to train automated view planning systems.
Approach
The method uses a hierarchical grid search that progressively narrows candidate sensor positions and orientations by evaluating visibility and surface similarity against each scan frame.
Key results
- Average Chamfer distance of 8.1 mm between extracted and ground-truth poses
- 0.85 average surface coverage across scanned objects
- 213 curated expert scanning demonstrations for mounted dishes
- Frame-only pose extraction without explicit tracking hardware
Why it matters
Enables scalable, automated conversion of raw scan data into structured expert demonstrations, accelerating cultural heritage archiving and future learning-based robotic scanning.
Abstract
High-quality 3D reconstruction of unknown small objects with complex surface details is important in applications such as digital preservation and cultural heritage archiving. In practice, such scanning procedures rely heavily on skilled human experts, but the high cost of expert training and the large number of objects requiring digitization make this process difficult to scale. This motivates the need to construct expert demonstration datasets as a foundation for future automated view planning. However, available scan data often contain only frame-level geometry without per-frame sensor poses. To address this issue, we propose a hierarchical grid-based method for extracting sensor poses from frame-based scan data. The proposed method progressively refines candidate poses through coarse-to-fine grid search and selects poses that effectively observe the target surface. Experimental results show an average coverage of 0.85, demonstrating the practicality of the proposed approach for expert demonstration dataset construction.