← Back ICRA 2026

MonoDuo: Using One Robot Arm to Learn Bimanual Policies

Sandeep Bajamahal, Lawrence Yunliang Chen, Toru Lin, Zehan Ma, Jitendra Malik, Ken Goldberg

PDF

AI summary

Key figure (auto-extracted from paper)

MonoDuo enables zero-shot bimanual policy learning on unseen robot configurations using only single-arm robot demonstrations paired with human collaboration.

bimanual manipulation cross-embodiment learning single-arm robot synthetic data augmentation zero-shot transfer imitation learning

Problem

Learning bimanual robot policies is bottlenecked by the scarcity of bimanual robots and datasets, despite single-arm robots being widely available.

Approach

The framework teleoperates a single-arm robot to collaborate with a human on bimanual tasks, then uses cross-painting, segmentation, and inpainting to synthesize bimanual robot demonstrations for policy training.

Key results

Achieves 35–70% zero-shot success on five complex bimanual tasks across unseen robot configurations.
Boosts few-shot fine-tuning success rates by 65–70% with only 25 target robot demonstrations.
Reduces teleoperation data collection time by ~79% compared to full bimanual teleoperation.
Introduces a structured cross-painting pipeline that bridges human-robot and robot-robot morphological gaps.

Why it matters

It democratizes bimanual robot learning by enabling coordinated two-arm policy training using widely available single-arm hardware and human collaboration.

Abstract

Bimanual coordination is essential for many real- world manipulation tasks, yet learning bimanual robot policies is limited by the scarcity of bimanual robots and datasets. Single- arm robots, however, are widely available in research labs. Can we leverage them to train bimanual robot policies? We present MonoDuo, a framework for learning bimanual manipulation policies using single-arm robot demonstrations paired with human collaboration. MonoDuo collects data by teleoperating a single-arm robot to perform one side of a bimanual task while a human performs the other, then swapping roles to cover both sides. RGB-D observations from a wrist-mounted and fixed camera are augmented into synthetic demonstrations for target bimanual robots using state-of-the-art hand pose estimation, image and point cloud segmentation, and inpainting. These synthetic demonstrations, grounded in real robot kinematics, are used to train bimanual policies. We evaluate MonoDuo on five tasks—box lifting, backpack packing, cloth folding, jacket zipping, and plate handover. Compared to approaches relying solely on human bimanual videos, MonoDuo enables zero-shot deployment on unseen bimanual robot configurations, achieving success rates up to 70%. With only 25 target robot demonstrations, few-shot finetuning further boosts success rates by 65–70% over training from scratch, demonstrating MonoDuo’s effectiveness in efficiently transferring knowledge from single-arm robot data to bimanual robot policies. Project page: https://bimanual-monoduo.github.io

Index terms

Bimanual Manipulation Transfer Learning Human-Robot Collaboration