AVR: Active Vision-Driven Precise Robot Manipulation with Viewpoint and Focal Length Optimization
Yushan Liu, Shilong Mu, Xintao Chao, Zizhen Li, Yao Mu, Tianxing Chen, Shoujie Li, Chuqiao Lyu, Xiao-Ping Zhang, Wenbo Ding
AI summary
Problem
Fixed or suboptimal camera views in robotic teleoperation impair fine-grained perception and cause occlusions, creating a bottleneck for imitation learning on complex, precision-demanding tasks.
Approach
AVR integrates a motorized zoom camera and head-tracked gimbal into a bimanual teleoperation system, enabling real-time viewpoint and focal length adjustments during data collection and autonomous modulation during policy deployment.
Key results
- 5–17% task success gains in simulation
- Over 25% real-world success improvement over static-view baselines
- Reduced teleoperation completion time and fewer failed trials
- Robust performance under occlusion, clutter, lighting changes, and unseen environments
Why it matters
Enables higher-fidelity data collection and more reliable policy learning, advancing practical dexterous robotic manipulation in complex real-world settings.
Abstract
Robotic manipulation in complex scenes demands precise perception of task-relevant details, yet fixed or subopti- mal viewpoints often impair fine-grained perception and induce occlusions, constraining imitation-learned policies. We present AVR (Active Vision-driven Robotics), a bimanual teleoperation and learning framework that unifies head-tracked viewpoint control (HMD-to-2-DoF gimbal) with motorized optical zoom to keep targets centered at an appropriate scale during data col- lection and deployment. In simulation, an AVR plugin augments RoboTwin demonstrations by emulating active vision (ROI- conditioned viewpoint change, aspect-ratio-preserving crops with explicit zoom ratios, and super-resolution), yielding 5–17% gains in task success across diverse manipulations. On our real- world platform, AVR improves success on most tasks, with over 25% gains compared to the static-view baseline, and extended studies further demonstrate robustness under occlusion, clutter, *These authors contributed equally to this work. †Corresponding authors. 1 Tsinghua University 2 National University of Singapore 3 Shanghai Jiao Tong University 4 The University of Hong Kong 5 Nanyang Technological University 6 Xspark AI, Shenzhen, China Project page: https://AVR-robot.github.io. and lighting disturbances, as well as generalization to unseen environments and objects. These results pave the way for future robotic precision manipulation methods in the pursuit of human-level dexterity and precision.