← Back ICRA 2026

A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning

Hao Chen, Takuya Kiyokawa, Zhengtao Hu, Weiwei Wan, Kensuke Harada

PDF

AI summary

Key figure (auto-extracted from paper)

A training-free similarity matching framework transfers grasping knowledge from a small database to unknown objects, significantly outperforming learning-based benchmarks in robustness and accuracy.

single-view grasping similarity matching robotic manipulation point cloud registration grasp planning C-FPFH descriptor

Problem

Single-view grasping of unknown objects suffers from partial observations and sensing noise, while existing learning-based methods require high training costs and struggle with environmental variability. Prior similarity-based approaches also rely on multi-view data or unstable composite scoring.

Approach

The method matches single-view visual features against a known object database across semantic, geometric, and dimensional levels, then plans imitative grasps and applies a stability-aware fine-tuning process.

Key results

Multi-level similarity matching framework integrating semantic, geometric, and dimensional features
Novel C-FPFH descriptor enabling accurate geometric matching between partial and complete point clouds
Two-stage stability-aware fine-tuning process for optimizing imitative grasp poses
Real-world experiments demonstrating superior accuracy and robustness over SOTA benchmarks using fewer than 100 database models

Why it matters

Provides a robust, training-free alternative to deep learning for real-world robotic manipulation, reducing data dependency and improving generalization across varying environments.

Abstract

Grasping unknown objects from a single view has remained a challenging topic in robotics due to the uncertainty of partial observation. Recent advances in large-scale models have led to benchmark solutions such as GraspNet-1Billion. However, such learning-based approaches still face a critical limitation in performance robustness for their sensitivity to sensing noise and environmental changes. To address this bottleneck in achieving highly generalized grasping, we abandon the traditional learning framework and introduce a new perspective: similarity matching, where similar known objects are utilized to guide the grasping of unknown target objects. We newly propose a method that ro- bustly achieves unknown-object grasping from a single viewpoint through three key steps: 1) Leverage the visual features of the observed object to perform similarity matching with an existing database containing various object models, identifying potential candidates with high similarity; 2) Use the candidate models with pre-existing grasping knowledge to plan imitative grasps for the unknown target object; 3) Optimize the grasp quality through a local fine-tuning process. To address the uncertainty caused by partial and noisy observation, we propose a multi-level similarity matching framework that integrates semantic, geometric, and dimensional features for comprehensive evaluation. Especially, we introduce a novel point cloud geometric descriptor, the C-FPFH descriptor, which facilitates accurate similarity assessment be- tween partial point clouds of observed objects and complete point clouds of database models. In addition, we incorporate the use of large language models, introduce the semi-oriented bounding box, and develop a novel point cloud registration approach based on plane detection to enhance matching accuracy under single- view conditions. Real-world experiments demonstrate that our proposed method significantly outperforms existing benchmarks in grasping a wide variety of unknown objects in both isolated and cluttered scenarios, showcasing exceptional robustness across varying object types and operating environments.

Index terms

Grasping Dexterous Manipulation Computer Vision for Automation Similarity Matching