← Back ICRA 2026

SAGrid: Scaling Robot Simulation through Automatic Affordance Annotation on In-The-Wild 3D Assets

Cem Gokmen, Yalcin Tur, Aditesh Kumar, Auddithio Nag, Li Fei-Fei

PDF

AI summary

Key figure (auto-extracted from paper)

Automating simulation affordance annotation on in-the-wild 3D assets enables scalable robot simulation and significantly improves policy generalization to unseen objects.

Robot simulation 3D asset annotation Simulation affordances Automated labeling Robot learning 3D vision

Problem

Scaling robot simulation is bottlenecked by a lack of simulation-ready 3D assets, as in-the-wild models lack specialized annotations for complex phenomena like fluids and heat, making manual annotation costly and limiting asset diversity.

Approach

SAGrid uses pretrained 2D visual and 3D geometric features to predict a dense distance field to the nearest simulation affordance on a voxelized mesh, requiring as few as 10 labeled examples per feature type.

Key results

Achieves 4.8 cm mean localization error, outperforming VLM and embedding-based baselines
Successfully annotates and integrates in-the-wild Objaverse-XL assets into the BEHAVIOR-1K simulator
Expanding training assets with automated annotations significantly improves robot policy generalization to unseen objects
Operates effectively in a low-data regime with only 10 training objects per affordance type

Why it matters

It removes the manual annotation bottleneck, allowing researchers and developers to scale up diverse, simulation-ready environments for training robust, real-world robot policies.

Abstract

Robot simulation is a highly efficient approach for scaling data collection for robot learning, but scaling for most household tasks remains bottlenecked by a shortage of simulation-ready 3D assets. While modern robot simulators can model complex phenomena like temperature and fluids, most in- the-wild 3D models lack “simulation affordances” (specialized annotations such as fluid source and heat emitter positions) that are required for these features. As a result, costly manual annotation is required, severely limiting asset scale and variety. We introduce Simulation Affordance Grids (SAGrid), a method that automates the annotation of simulation affor- dances on in-the-wild 3D meshes. SAGrid leverages pretrained representations (DINOv2, TRELLIS) to predict a dense 3D distance field to the nearest affordance. Our approach oper- ates effectively in a low-data regime, requiring as few as 10 training objects per affordance type to accurately locate these features. We validate our method by processing Objaverse-XL models and integrating them into the BEHAVIOR-1K simulator. Training robot policies on this automatically expanded asset suite significantly improves generalization to unseen objects in complex tasks, demonstrating that automated affordance annotation is crucial for scaling robot learning.

Index terms

Simulation and Animation Deep Learning in Grasping and Manipulation Data Sets for Robot Learning