← Back ICRA 2026

NovaFlow: Zero-Shot Manipulation Via Actionable Flow from Generated Videos

Hongyu Li, Lingfeng Sun, Yafei Hu, Duy Ta, Jennifer Barry, George Konidaris, Jiahui Fu

PDF

AI summary

Key figure (auto-extracted from paper)

NovaFlow enables zero-shot robot manipulation of rigid, articulated, and deformable objects by distilling commonsense motion from generated videos into a 3D actionable flow, eliminating the need for task-specific demonstrations or training.

Zero-shot manipulation Video generation 3D object flow Embodied AI Robot generalization Demonstration-free learning

Problem

Existing robot manipulation methods rely heavily on task-specific demonstrations or embodiment-matched data, creating a data bottleneck that limits zero-shot generalization across different objects and robot platforms.

Approach

The framework generates a task-solving video using a pretrained video model, distills it into a 3D actionable object flow using off-the-shelf perception modules, and translates this flow into robot trajectories via grasp proposals and trajectory optimization without requiring robot-specific data.

Key results

Effective zero-shot execution on rigid, articulated, and deformable object tasks without demonstrations
Successful cross-embodiment transfer between a Franka arm and Spot quadrupedal robot
Novel actionable 3D object flow representation decoupling task understanding from low-level control
State-of-the-art performance on real-world manipulation tasks compared to demonstration-free baselines

Why it matters

It provides a scalable, data-efficient pathway to generalist robots by leveraging internet-scale video models, bypassing the costly data collection bottleneck of traditional end-to-end learning.

Abstract

Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across plat- forms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle- based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero- shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/. 1Robotics and AI Institute (RAI) 2Brown University. ∗Hongyu Li and Lingfeng Sun contribute equally. Email: hli230@cs.brown.edu and lingfengsun@berkeley.edu. Hongyu Li and George Konidaris were supported by the Office of Naval Research (ONR) under REPRISM MURI N000142412603 and ONR grant N00014-22-1- 2592. Partial funding was also provided by the RAI.

Index terms

Deep Learning Methods Deep Learning in Grasping and Manipulation Perception for Grasping and Manipulation