From One Image to Precision Pose: Seed-Diverse Diffusion Models and Model-Selection-Driven Hybrid Servoing in Limited Viewpoints
Daigo Terazono, Takashi Nammoto, Ryota Kato, Naoya Chiba, Shingo Kagami, Koichi Hashimoto
Abstract
In robotic visual servoing, when prior measure- ment or imaging is impractical, control must rely on a single image of the initial view. Diffusion models can generate 3D shapes from a single image; however, the blind spot area involves uncertainty, which may degrade alignment accuracy if used directly in model-dependent control manipulation. This study proposes a hybrid visual servo control method that operates under the assumption of this uncertainty. From a single image, multiple 3D shape candidates are sampled using a pre-trained generative model, and then progressively controlled while retaining them. First, rough positioning is performed using PBVS (Position-Based Visual Servoing) with multiple shape candidates. Next, the system compares the candidate images rendered with the observed image and selects the best model based on geometric error and visual similarity. Finally, IBVS (Image-Based Visual Servoing) uses the selected model to refine slight alignment errors with high precision. This proposed method achieves high-precision approach and alignment from minimal input of a single image, providing a framework that resolves the problems of shape uncertainty and control error caused by 3D generation. Experiments show that the convergence success rate improved as the number of shape candidates increased and that high-precision alignment was achieved through the staged integration of PBVS and IBVS.