PokeFlex: A Real-World Dataset of Volumetric Deformable Objects for Robotics
Jan Obrist, Miguel Angel Zamora Mora, Hehui Zheng, Ronan Hinchet, Firat Ozdemir, Juan Jose Zarate, Robert Kevin Katzschmann, Stelian Coros
AI summary
Problem
Data-driven deformable object manipulation lacks high-quality real-world datasets that capture complex temporal deformations alongside synchronized contact forces and 3D geometry.
Approach
The authors integrated a force-sensing robot arm with a professional multi-view volumetric capture system and synchronized RGB-D cameras to collect paired 3D meshes, images, point clouds, and interaction wrenches from poking and dropping protocols.
Key results
- Curated 21.3k synchronized frames across 18 everyday and 3D-printed deformable objects
- Released open-source CAD files and baseline neural networks for template-based mesh reconstruction
- Achieved state-of-the-art online reconstruction speeds (33–185 Hz) with high geometric accuracy using multimodal inputs
- Established comprehensive evaluation metrics and baselines for future deformable object research
Why it matters
It provides the foundational data and benchmarks needed to advance real-time, data-driven control and manipulation of soft and deformable materials in robotics.
Abstract
Data-driven methods have shown great potential in solving challenging manipulation tasks; however, their applica- tion in the domain of deformable objects has been constrained, in part, by the lack of data. To address this lack, we propose PokeFlex, a dataset featuring real-world multimodal data that is paired and annotated. The modalities include 3D textured meshes, point clouds, RGB images, and depth maps. Such data can be leveraged for several downstream tasks, such as online 3D mesh reconstruction, and it can potentially enable underexplored applications such as the real-world deployment of traditional control methods based on mesh simulations. To deal with the challenges posed by real-world 3D mesh reconstruction, we leverage a professional volumetric capture system that allows complete 360° reconstruction. PokeFlex consists of 18 deformable objects with varying stiffness and shapes. Deformations are generated by dropping objects onto a flat surface or by poking the objects with a robot arm. Interaction wrenches and contact locations are also reported for the latter case. Using different data modalities, we demonstrated a use case for our dataset training models that, given the novelty of the multimodal nature of Pokeflex, constitute the state-of-the-art in multi-object online template-based mesh reconstruction from multimodal data, to the best of our knowledge. The full dataset, pretrained models, and code are available on the project website1. Manuscript received: January, 23, 2025; Revised May, 13, 2025; Accepted July, 19, 2025. This paper was recommended for publication by Editor A. Faust upon evaluation of the Associate Editor and Reviewers’ comments. This work is supported by the SDSC Grant entitled ‘C22-08: Data-Driven Inference of Mesh-based Representations for Deformable Objects from Unstructured Point Clouds’. ∗Denotes equal contribution. J. Obrist∗, H. Zheng∗, R. Hinchet, and R. K. Katzschmann are with the Department of Mechanical and Process Engineering, ETH Zurich, Switzerland rkk@ethz.ch M. Zamora∗, J. Zarate, and S. Coros are with the Department of Computer Science, ETH Zurich, Switzerland scoros@ethz.ch F. Ozdemir is with the Swiss Data Science Center, ETH Zurich & EPFL, Zurich, Switzerland. Digital Object Identifier (DOI): see top of this page. 1https://pokeflex-dataset.github.io/