← Back ICRA 2026

Observer�Actor: Active Vision Imitation Learning with Sparse-View Gaussian Splatting

Yilong Wang, Cheng Qian, Ruomeng Fan, Edward Johns

PDF

AI summary

Key figure (auto-extracted from paper)

Dynamically repositioning a camera to an optimal viewpoint at test time using sparse-view 3DGS significantly boosts imitation learning success rates, particularly in occluded scenarios.

Active vision Imitation learning 3D Gaussian Splatting Robotic manipulation Occlusion handling Sparse-view reconstruction

Problem

Current imitation learning relies on static or wrist-mounted cameras that struggle with occlusions and limited viewpoints, while existing active vision methods require fixed roles or extensive human demonstrations.

Approach

The ObAct framework dynamically assigns observer and actor roles at test time, using three captured images to build a 3D Gaussian Splatting model that optimizes a low-occlusion viewpoint before the actor executes the task.

Key results

Introduces the ObAct decoupled observer–actor framework
First application of sparse-view 3DGS for test-time active vision optimization
Extends trajectory transfer and behavior cloning to dynamic view-conditioned settings
Achieves up to 233% success rate improvement over static cameras under occlusion

Why it matters

Enables more robust and data-efficient robotic manipulation policies by dynamically optimizing camera views, benefiting researchers and practitioners in active vision and imitation learning.

Abstract

We propose Observer-Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-arm robotic system equipped with wrist- mounted cameras. At test time, ObAct dynamically assigns observer and actor roles: the observer arm constructs a 3D Gaussian Splatting (3DGS) representation from three images, virtually explores this to find an optimal camera pose, then moves to this pose; the actor arm then executes a policy using the observer’s observations. This formulation enhances the clarity and visibility of both the object and the gripper in the policy’s observations. As a result, we enable the training of ambidextrous policies on observations that remain closer to the occlusion-free training distribution, leading to more robust policies. We study this formulation with two existing imitation learning methods – trajectory transfer and behaviour cloning – and experiments show that ObAct significantly outperforms static-camera setups: trajectory transfer improves by 145% without occlusion and 233% with occlusion, while behavior cloning improves by 75% and 143%, respectively. Videos are available at https://obact.github.io.

Index terms

Imitation Learning Dual Arm Manipulation Perception for Grasping and Manipulation