← Back ICRA 2023

Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement

Zirui Zhao, Wee Sun Lee, David Hsu

PDF

Abstract

We present a new method, PARsing And visual GrOuNding (PARAGON), for grounding natural language in object placement tasks. Natural language generally describes objects and spatial relations with compositionality and ambi- guity, two major obstacles to effective language grounding. For compositionality, PARAGON parses a language instruction into an object-centric graph representation to ground objects individually. For ambiguity, PARAGON uses a novel particle- based graph neural network to reason about object placements with uncertainty. Essentially, PARAGON integrates a parsing algorithm into a probabilistic, data-driven learning framework. It is fully differentiable and trained end-to-end from data for robustness against complex, ambiguous language input.

Index terms

AI-Enabled Robotics Human-Robot Collaboration