← Back ICRA 2026

Hierarchical Grid-Based Sensor Pose Extraction for Demonstration Dataset Generation

Doyu Lim, Chaewon Park, Soohee Han

PDF

AI summary

Key figure (auto-extracted from paper)

A coarse-to-fine grid search accurately extracts per-frame sensor poses from unstructured scan data, enabling scalable construction of expert demonstration datasets for cultural heritage digitization.

sensor pose extraction hierarchical grid search 3D scanning expert demonstration dataset cultural heritage digitization automated view planning

Problem

Manual expert scanning for complex cultural heritage objects is costly and unscalable, while existing scan datasets lack the per-frame sensor poses needed to train automated view planning systems.

Approach

The method uses a hierarchical grid search that progressively narrows candidate sensor positions and orientations by evaluating visibility and surface similarity against each scan frame.

Key results

Average Chamfer distance of 8.1 mm between extracted and ground-truth poses
0.85 average surface coverage across scanned objects
213 curated expert scanning demonstrations for mounted dishes
Frame-only pose extraction without explicit tracking hardware

Why it matters

Enables scalable, automated conversion of raw scan data into structured expert demonstrations, accelerating cultural heritage archiving and future learning-based robotic scanning.

Abstract

High-quality 3D reconstruction of unknown small objects with complex surface details is important in applications such as digital preservation and cultural heritage archiving. In practice, such scanning procedures rely heavily on skilled human experts, but the high cost of expert training and the large number of objects requiring digitization make this process difficult to scale. This motivates the need to construct expert demonstration datasets as a foundation for future automated view planning. However, available scan data often contain only frame-level geometry without per-frame sensor poses. To address this issue, we propose a hierarchical grid-based method for extracting sensor poses from frame-based scan data. The proposed method progressively refines candidate poses through coarse-to-fine grid search and selects poses that effectively observe the target surface. Experimental results show an average coverage of 0.85, demonstrating the practicality of the proposed approach for expert demonstration dataset construction.

Index terms

Data Sets for Robot Learning Reactive and Sensor-Based Planning Learning from Demonstration