Robust Bayesian Scene Reconstruction with Retrieval-Augmented Priors for Precise Grasping and Planning
Herbert Wright, Weiming Zhi, Martin Matak, Matthew Johnson-Roberson, Tucker Hermans
AI summary
Problem
Building accurate 3D representations from noisy, partial single-view RGBD data is essential for robotic manipulation but hindered by occlusions and unknown objects. Current deep learning approaches lack robustness and calibrated uncertainty, while non-learning methods cannot infer unobserved geometry.
Approach
The method uses a foundation model to retrieve relevant shape priors from a mesh database, then combines them with observed depth data via Stein Variational Gradient Descent to infer a posterior distribution over object shapes.
Key results
- Accurate multi-object 3D reconstructions from single RGBD views
- Robustness to noisy real-world data and out-of-distribution objects
- Principled uncertainty quantification for occluded geometry
- Improved real-world grasping success in cluttered scenes
Why it matters
Enables robots to safely and accurately manipulate objects in unstructured environments by providing reliable geometric understanding and uncertainty estimates.
Abstract
Constructing 3D representations of object geometry is critical for many robotics tasks, particularly manipulation problems. These representations must be built from potentially noisy partial observations. In this work, we focus on the problem of reconstructing a multi-object scene from a single RGBD image using a fixed camera. Traditional scene representation methods generally cannot infer the geometry of unobserved regions of the objects in the image. Attempts have been made to leverage deep learning to train on a dataset of known objects and rep- resentations, and then generalize to new observations. However, this can be brittle to noisy real-world observations and objects not contained in the dataset, and do not provide well-calibrated reconstruction confidences. We propose BRRP, a reconstruction method that leverages preexisting mesh datasets to build an informative prior during robust probabilistic reconstruction. We introduce the concept of a retrieval-augmented prior, where we retrieve relevant components of our prior distribution from a database of objects during inference. The resulting prior enables estimation of the geometry of occluded portions of the in-scene objects. Our method produces a distribution over object shape that can be used for reconstruction and measuring uncertainty. We evaluate our method in both simulated scenes and in the real world. We demonstrate the robustness of our method against deep learning-only approaches while being more accurate than a method without an informative prior. Through real-world experiments, we particularly highlight the capability of BRRP to enable successful dexterous manipulation in clutter.