On Learning Scene-Aware Generative State Abstractions for Task-Level Mobile Manipulation Planning
Julian Förster, Jen Jen Chung, Lionel Ott, Roland Siegwart
Abstract
Task and motion planning (TAMP) is a promising approach for efficient long-horizon manipulation planning, which is a prerequisite for being able to deploy manipulation systems in human-centered environments at scale. TAMP sys- tems often rely on so-called predicates to abstractly describe the world. Today, predicates and their groundings are often hand-engineered. Furthermore, robot action parameterizations required to fulfill desired predicates are typically discovered by sampling naively or using oracles (again hand-engineered). We aim to automate predicate discovery and grounding with a system that learns to classify the state of predicates in a set of scenes while concurrently learning to generate scene configurations that fulfill the desired predicates. Our results show that high classification accuracies and generation success rates can be achieved with architectures based on multi-layer perceptrons (MLPs) and graph neural networks (GNNs) that are trained on bounding box as well as point cloud-based features in a Generative Adversarial Network (GAN)-inspired fashion, decisively outperforming both decision tree and uni- form sampler baselines. The integration of our framework into a TAMP system demonstrates its positive impact on solving mobile manipulation tasks. A reference implementation of our method and data are available at https://github.com/ethz- asl/predicate learning.