← Back IROS 2024

Open6DOR: Benchmarking Open-Instruction 6-DoF Object Rearrangement and a VLM-Based Baseline

Yufei Ding, Haoran Geng, Chaoyi Xu, Xiaomeng Fang, Jiazhao Zhang, Songlin Wei, Zhizheng Zhang, He Wang

PDF

Abstract

The integration of large-scale Vision-Language Models (VLMs) with embodied AI can greatly enhance the generalizability and the capacity to follow open instructions for robots. However, existing studies on object manipulation are not up to full consideration of the 6-DoF requirements, let alone establishing a comprehensive benchmark. In this paper, we propel the pioneer construction of the benchmark and approach for Open-instruction 6-DoF Object Rearrangement (Open6DOR). Specifically, we collect a synthetic dataset of 200+ objects and carefully design 5400+ Open6DOR tasks. These tasks are divided into the Position-track, Rotation-track, and 6-DoF-track for evaluating different embodied agents in predicting the positions and rotations of target objects. Besides, we also propose a VLM-based approach for ̊ Equal contribution. 1 Peking University. 2 Galbot 3 University of California, Berkeley 4 Beijing Academy of Artificial Intelligence. Corresponding author: hewang@pku.edu.cn Open6DOR, named Open6DOR-GPT, which empowers GPT- 4V with 3D-awareness and simulation-assistance while exploit- ing its strengths in generalizability and instruction-following. We compare the existing embodied agents with our Open6DOR- GPT on the proposed Open6DOR benchmark and find that Open6DOR-GPT achieves the state-of-the-art performance. We further show the impressive performance of Open6DOR- GPT in diverse real-world experiments.

Index terms

Deep Learning in Grasping and Manipulation AI-Enabled Robotics Manipulation Planning