TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Yan Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, Karen Liu
AI summary
Problem
Existing humanoid teleoperation systems either lack full whole-body control or depend on expensive, non-portable motion capture setups, limiting scalable data collection for humanoid robots.
Approach
TWIST2 uses a low-cost PICO4U VR headset and ankle trackers for mocap-free whole-body motion capture, paired with a custom 2-DoF active neck for egocentric vision, enabling holistic human-to-robot retargeting and a hierarchical visuomotor policy framework.
Key results
- Achieves full whole-body control with a portable ~$1000 VR setup and a $250 add-on neck
- Captures ~100 successful demonstrations in 15-20 minutes with near 100% success rate
- Trains a hierarchical visuomotor policy for autonomous full-body control using egocentric vision
- Demonstrates long-horizon dexterous manipulation and dynamic legged tasks like towel folding and kicking
Why it matters
It provides a reproducible, low-cost framework for scalable humanoid data collection and autonomous whole-body control, accelerating progress in humanoid robotics.
Abstract
Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action mod- els in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expensive motion capture setups. We introduce TWIST2, a portable, mocap-free humanoid teleoperation and data collection system that preserves full whole-body control while advancing scalability. Our system leverages PICO4U VR for obtaining real-time whole-body human motions, with a cus- tom 2-DoF robot neck (cost around $250) for egocentric vision, enabling holistic human-to-humanoid control. We demonstrate long-horizon dexterous and mobile humanoid skills and we can collect 100 demonstrations in 15 minutes with an almost 100% success rate. Building on this pipeline, we propose a hierarchical visuomotor policy framework that autonomously controls the full humanoid body based on egocentric vision. Our visuomotor policy successfully demonstrates whole-body dexterous manipu- lation and dynamic kicking tasks. The entire system is fully re- producible and open-sourced at https://yanjieze.com/TWIST2. Our collected dataset is also open-sourced at https://twist- data.github.io.