EmbodiedCoder: Parameterized Embodied Mobile Manipulation Via Modern Coding Model
Zefu Lin, Rongxu Cui, Chen Hanning, Xiangyu Wang, Junjia Xu, Xiaojuan Jin, Chen Wenbo, Hui Zhou, Lue Fan, Wenling Li, Zhaoxiang Zhang
AI summary
Problem
Current robot manipulation methods struggle to scale to diverse environments due to heavy reliance on annotated datasets, limited interpretability, and fixed libraries of predefined primitives that fail on novel objects or contact-rich tasks.
Approach
EmbodiedCoder uses vision-language models to extract semantic scene data, then prompts a coding model to generate executable code for parameterizing object geometry and synthesizing constraint-aware trajectories, which the robot executes via sampled waypoints.
Key results
- Training-free code-driven framework for open-world mobile manipulation
- Geometric parameterization of task-relevant objects into functional primitives
- Constraint-aware trajectory synthesis respecting physical and environmental limits
- Robust real-world performance and generalization to novel objects without fine-tuning
Why it matters
Offers an interpretable, data-efficient paradigm for bridging high-level reasoning and low-level control, advancing versatile robot intelligence in unstructured environments.
Abstract
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular sys- tems with predefined primitives, have advanced robots’ ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability. In this work, we introduce Embod- iedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object ge- ometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning. This coding- based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments. Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at https://embodiedcoder.github.io/EmbodiedCoder/.