Research Analyzer
← Back ICRA 2026

ManiMorph: Object Representations in Robot Manipulators Morphology for Improving Multi-Task Manipulation Performance

Ali Abdalla, Michael Przystupa, Xinrui Zu, Kevin Sebastian Luck, Glen Berseth

PDF

AI summary

Treating held objects as dynamic nodes in a robot's morphology graph, combined with FiLM task modulation, significantly boosts multi-task manipulation performance and zero-shot generalization.
morphology-aware learning robotic manipulation object representation FiLM modulation multi-task learning graph transformers

Problem

Morphology-aware manipulation frameworks currently ignore how object interactions dynamically alter a robot's kinematic chain, limiting policy robustness and multi-task generalization.

Approach

ManiMorph unifies robot limbs and target objects into a single graph processed by a Transformer, using FiLM layers to condition the network on task-specific requirements without architectural changes.

Key results

  • Object-as-node representation outperforms baselines on Lift and Door tasks
  • FiLM task adapter enables robust multi-robot multi-task learning across control spaces
  • Achieves zero-shot generalization to unseen object geometries and physical properties
  • Surpasses alternative frameworks in cumulative reward and sustained contact control

Why it matters

Provides a scalable, morphology-aware foundation for robots to handle diverse objects and tasks without retraining, advancing general-purpose manipulation.

Abstract

Robot manipulation tasks involve direct interac- tions with objects, which can be viewed as dynamic changes to the robot’s kinematic chain. Morphology-aware learning frame- works, in which robot embodiment is explicitly modeled, do not account for these object-induced changes in their architectures. We address this gap by proposing ManiMorph, a multi-task, morphology-aware manipulation-learning framework in which object features are integrated into the robot’s morphological graph. We demonstrate that this node-centric representation, combined with a Feature-wise Linear Modulation (FiLM) task component, enhances the performance of the morphology-aware frameworks for robotic manipulation and generalizes effectively to new object variations.

Index terms

Reinforcement Learning Deep Learning in Grasping and Manipulation Deep Learning Methods

Related papers