Research Analyzer
← Back ICRA 2026

Learning to Grasp by Integrating Human Preferences and Success Feedback

Juyeol Park, Byungjin Ko, Jong-Wan Yoon, Taejoon Park, Homin Park

PDF

AI summary

Key figure (auto-extracted from paper)
Integrating human preferences with binary success feedback via a novel Weighted Success Reward significantly improves the reliability and real-world transferability of end-to-end robotic grasping policies.
Robotic grasping RLHF Reward modeling Human preference End-to-end learning Real-world transfer

Problem

Designing reliable reward functions for end-to-end robotic grasping remains difficult, as handcrafted rewards are prone to reward hacking and preference-based models often align with human intuition but fail to guarantee successful physical execution or generalize to new environments.

Approach

The authors propose a three-stage framework that trains a reward model on human preferences and combines it with binary success feedback into a Weighted Success Reward to fine-tune the grasping policy.

Key results

  • First end-to-end RLHF framework for robotic grasping in cluttered scenes
  • Curated a standardized human preference dataset for grasping with explicit labeling guidelines
  • Achieves higher success and completion rates with fewer collisions in simulation
  • Transfers to real-world hardware with less performance degradation than baseline methods

Why it matters

Provides a practical pathway for aligning robotic manipulation with human intuition while ensuring robust, real-world execution, benefiting researchers and engineers in safe robot control.

Abstract

End-to-end robotic grasping increasingly relies on reinforcement learning to enable safe and precise execution, yet defining a reward that consistently drives such behavior remains a central challenge. Human-engineered rewards have been widely explored, but they are prone to reward hacking, depend heavily on artificial design choices, and often fail to capture human intuition. Preference-based reward models offer a promising alternative by aligning policies with human feedback, but their application to robotic grasping has remained limited, and preference-aligned actions do not always translate into successful execution. We propose Human Preference and Success-based Grasping (HPSG), a three-stage framework that combines pre-training, reward modeling, and fine-tuning. At its core is the Weighted Success Reward (WSR), which inte- grates a preference-trained reward model with binary success feedback so that policies learn behaviors that are effective in practice and aligned with human judgment. This design resolves the mismatch between subjective preferences and execution outcomes, thereby improving reliability. Through extensive simulation and real-world experiments, we show that HPSG produces reliable grasping policies, achieving higher success and completion rates, reducing collisions, and transferring to physical settings with smaller performance degradation than baseline methods. Our code is publicly available at: https: //github.com/qkrwnduf1997/HPSG

Index terms

Deep Learning in Grasping and Manipulation Reinforcement Learning Deep Learning for Visual Perception

Related papers