← Back ICRA 2026

Active-Perceptive Language-Oriented Grasp Policy for Heavily Cluttered Scenes

Yixiang Dai, Siang Chen, Kaiqin Yang, Dingchang Hu, Pengwei Xie, Guosheng Li, Yuan Shen, Guijin Wang

PDF

AI summary

Key figure (auto-extracted from paper)

APeG significantly improves language-guided grasping success and efficiency in heavily cluttered scenes by dynamically interleaving grasp execution with occlusion-aware viewpoint optimization.

Active Perception Language-Guided Grasping Reinforcement Learning Occlusion Handling Robotic Manipulation Next-Best-View

Problem

Language-guided robotic grasping in heavily cluttered environments fails due to severe occlusions and perceptual ambiguity. Existing methods rely on static viewpoints or inefficient pre-scanning paradigms that cannot sequentially uncover buried targets.

Approach

The proposed APeG framework dynamically interleaves grasp attempts with active viewpoint planning based on joint occlusion-semantic information gain, while using a grasp-wise reinforcement learning policy to select robust grasp poses.

Key results

Proposes APeG, a closed-loop active-perceptive framework for language-oriented grasping
Introduces an occlusion-aware, semantic-guided viewpoint optimization strategy
Develops a grasp-wise reinforcement learning policy for robust grasp selection
Demonstrates significant improvements in task success rate and efficiency over baselines in simulation and real-world tests

Why it matters

Advances practical language-conditioned robotic manipulation by enabling reliable object retrieval in complex, occluded environments where traditional methods fail.

Abstract

Language-guided robotic grasping in cluttered envi- ronments presents significant challenges due to severe occlusions and complex scene structures, which often hinder accurate target localization. Existing approaches typically suffer from limited observational capabilities, resulting in suboptimal exploration of the target object. In this paper, we propose a novel Active- Perceptive Language-Oriented Grasp Policy (APeG) for heavily cluttered scenes. APeG develops an active perception scheme in the grasp pipeline via an occlusion-aware, semantic-guided viewpoint optimization strategy, enabling efficient exploration of cluttered scenes. In addition, a grasp-wise Reinforcement Learning (RL) policy is proposed to select robust grasp poses. Extensive real-world experiments validate the effectiveness of APeG, demonstrating significant improvements in both task success rate and operational efficiency over existing baselines, highlighting its potential for practical deployment in language- conditioned robotic manipulation.

Index terms

Perception for Grasping and Manipulation Deep Learning in Grasping and Manipulation RGB-D Perception