← Back ICRA 2026

PLOP: Particle Filtering for Learning Object Physics from Robot Interaction Videos

Junyu Nan, Sergey Zakharov, Kris Kitani

PDF

AI summary

Key figure (auto-extracted from paper)

PLOP accurately learns deformable object dynamics from RGB-D videos by combining particle filtering over adaptive 3D Gaussians with an efficient particle-grid network to handle complex topological changes.

deformable object dynamics particle filtering 3D Gaussians robot interaction topological changes implicit particle interaction network

Problem

Learning dynamics of deformable objects from robot interaction videos is challenging due to insufficient visual cues, complex deformations, and topological changes like cutting that break fixed-particle representations.

Approach

The framework applies a particle filter over a dynamic set of 3D Gaussians, using a learned dynamics model to predict motion and a resampling function to adaptively split or merge Gaussians, all accelerated by a mixed particle-grid network called I2N.

Key results

53.15% improvement in 3D reconstruction accuracy on simulation benchmarks
6.84% improvement in 2D reconstruction accuracy on simulation benchmarks
28.41% and 24.45% gains in 3D and 2D reconstruction on real-world cutting datasets
O(N) computational complexity for dynamics prediction via particle-grid interaction

Why it matters

Enables robots to accurately model and predict complex deformable object physics from visual data, advancing autonomous manipulation and simulation.

Abstract

Learning the dynamics of deformable objects, such as dough or a sponge, from RGB-D videos is challenging due to insufficient visual cues and complex deformations. We introduce PLOP (Particle Filtering for Learning Object Physics), a novel framework to learn the dynamics model of deformable objects using a particle filter over 3D Gaussians. Our method learns a dynamics function to predict the state of the object in the next time step, and a resampling function to split and merge Gaussians to handle complex object deformations such as cutting. We propose I2N (Implicit Particle Interaction Network) as the dynamics function within PLOP, a model leveraging a mixed particle-grid representation inspired by the Material Point Method (MPM). By transferring particle features to grid nodes, solving for grid dynamics, and then projecting solutions back to particles, our approach avoids the need for explicit pairwise interaction reasoning between particles, significantly reducing computational cost when there is a large number of particles. While PLOP is applicable to general robot-object interactions, we evaluate our approach on cutting sequences in both simulation and the real world, which induce challenging topological changes and expose previously occluded surfaces. On these benchmarks, PLOP achieves a 53.15% improvement in 3D reconstruction accuracy and a 6.84% improvement in 2D reconstruction accuracy on the simulation benchmark, as well as 28.41% and 24.45% improvements in 3D and 2D reconstruction metrics, respectively, on the real-world dataset.

Index terms

Computer Vision for Automation Visual Learning