← Back ICRA 2026

AVR: Active Vision-Driven Precise Robot Manipulation with Viewpoint and Focal Length Optimization

Yushan Liu, Shilong Mu, Xintao Chao, Zizhen Li, Yao Mu, Tianxing Chen, Shoujie Li, Chuqiao Lyu, Xiao-Ping Zhang, Wenbo Ding

PDF

AI summary

Key figure (auto-extracted from paper)

Dynamically controlling camera viewpoint and optical zoom during teleoperation and policy deployment significantly boosts success rates and robustness in precise robotic manipulation.

Active Vision Robotic Manipulation Teleoperation Imitation Learning Optical Zoom Bimanual Control

Problem

Fixed or suboptimal camera views in robotic teleoperation impair fine-grained perception and cause occlusions, creating a bottleneck for imitation learning on complex, precision-demanding tasks.

Approach

AVR integrates a motorized zoom camera and head-tracked gimbal into a bimanual teleoperation system, enabling real-time viewpoint and focal length adjustments during data collection and autonomous modulation during policy deployment.

Key results

5–17% task success gains in simulation
Over 25% real-world success improvement over static-view baselines
Reduced teleoperation completion time and fewer failed trials
Robust performance under occlusion, clutter, lighting changes, and unseen environments

Why it matters

Enables higher-fidelity data collection and more reliable policy learning, advancing practical dexterous robotic manipulation in complex real-world settings.

Abstract

Robotic manipulation in complex scenes demands precise perception of task-relevant details, yet fixed or subopti- mal viewpoints often impair fine-grained perception and induce occlusions, constraining imitation-learned policies. We present AVR (Active Vision-driven Robotics), a bimanual teleoperation and learning framework that unifies head-tracked viewpoint control (HMD-to-2-DoF gimbal) with motorized optical zoom to keep targets centered at an appropriate scale during data col- lection and deployment. In simulation, an AVR plugin augments RoboTwin demonstrations by emulating active vision (ROI- conditioned viewpoint change, aspect-ratio-preserving crops with explicit zoom ratios, and super-resolution), yielding 5–17% gains in task success across diverse manipulations. On our real- world platform, AVR improves success on most tasks, with over 25% gains compared to the static-view baseline, and extended studies further demonstrate robustness under occlusion, clutter, *These authors contributed equally to this work. †Corresponding authors. 1 Tsinghua University 2 National University of Singapore 3 Shanghai Jiao Tong University 4 The University of Hong Kong 5 Nanyang Technological University 6 Xspark AI, Shenzhen, China Project page: https://AVR-robot.github.io. and lighting disturbances, as well as generalization to unseen environments and objects. These results pave the way for future robotic precision manipulation methods in the pursuit of human-level dexterity and precision.

Index terms

Imitation Learning Telerobotics and Teleoperation Learning from Demonstration