Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
Jiawei Wang, Dingyou Wang, Jiaming Hu, Qixuan Zhang, Lan Xu, Jingyi Yu
AI summary
Problem
Creating accurate articulated 3D models for high-DoF objects is labor-intensive and typically relies on motion sequences or strong priors, making automated reconstruction from static inputs difficult.
Approach
The framework generates segmented meshes from images or text, uses Monte Carlo Tree Search with structural rewards to infer the kinematic tree, and optimizes joint parameters via a distance-weighted, contact-aware SDF objective.
Key results
- Open-vocabulary framework generating articulated objects from RGB images or text without motion data or training.
- MCTS-based kinematic tree inference that resolves multi-branch ambiguities using structural priors.
- DW-CAVL algorithm for accurate joint parameter estimation from static geometry via SDF-driven optimization.
- State-of-the-art accuracy in joint axis orientation and tree topology recovery on everyday objects and high-DoF robots.
Why it matters
Enables automated, physics-aware robot self-modeling and environment interaction without costly motion capture or manual annotation.
Abstract
A deep understanding of kinematic structures is essential for robot motion and interaction with the environment. Such understanding is captured through articulated objects, which are essential for physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or strong assumptions from hand-curated datasets. In this paper, we introduce Kinematify, an automated framework that synthesizes articulated objects from arbitrary RGB images or textual descriptions. Our method addresses two core chal- lenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static 3D geometry. To achieve this, we combine MCTS search for structural in- ference with geometry-driven optimization for joint reasoning, producing physically consistent and functionally valid models. We evaluate Kinematify on diverse inputs from both synthetic environments and real-world, demonstrating improvements in registration and kinematic topology accuracy over prior work. https://sites.google.com/deemos.com/kinematify 1Deemos Corporation, Wilmington, DE 19801, USA. Emails: {joel.wang, dingyou, zhangqx}@deemos.com. 2ShanghaiTech University, Shanghai, China. Emails: {wangdy2024, zhangqx1, yujingyi, xulan1}@shanghaitech.edu.cn. 3Contextual Robotics Institute, UC San Diego, La Jolla, CA 92093, USA. Emails: {jiw179, jih189}@ucsd.edu. †Project lead: Qixuan Zhang (zhangqx@deemos.com). *Corresponding authors: Jingyi Yu (yujingyi@shanghaitech.edu.cn), Lan Xu (xulan1@shanghaitech.edu.cn).