← Back ICRA 2026

Sym-Servo: Disambiguate Symmetric Object Pose by End-To-End Optimal Visual Servo

Shuxin li, Anzhe Chen, Haojian Lu, Rong Xiong, Yue Wang

PDF

AI summary

Key figure (auto-extracted from paper)

Reframing symmetric object control as an end-to-end visual servo task resolves pose ambiguity and enables stable, model-free robotic manipulation.

visual servoing symmetric objects robotic manipulation diffusion models reinforcement learning model-free control

Problem

Existing 6D pose estimation methods for symmetric objects either output flickering single poses that destabilize control or require CAD models and fail to generalize. This creates a gap in achieving robust, model-free control for symmetric objects in open-world scenarios.

Approach

The authors formulate symmetric object manipulation as an end-to-end visual servo task that directly maps current and desired images to velocity commands. A deterministic policy is jointly trained with a diffusion-based generator to capture symmetry-aware features, then refined via reinforcement learning and self-imitation learning for stability and efficiency.

Key results

Formulates symmetric object manipulation as an end-to-end visual servo problem
Introduces joint learning of a deterministic policy and diffusion-based generator for symmetry-aware features
Validates stable, model-free control and strong generalization in simulation and real-world experiments
Outperforms baseline pose estimation and visual servo methods in IoU and success rates

Why it matters

Enables robots to reliably manipulate symmetric objects in model-free, open-world settings without CAD dependencies or perception-induced control oscillations.

Abstract

Controlling symmetric objects is an indispensable but challenging task in robotic manipulation. Mainstream perception-action frameworks rely on accurate 6D pose es- timation to guide the controller. However, the majority of existing 6D pose estimation methods for symmetric objects are designed to output a single pose, which can flicker between multiple equivalent solutions across consecutive frames, leading to instability in the control loop. While some approaches can output multiple hypotheses to represent the ambiguity, above methods generally cannot achieve model-free manner and strong generalization simultaneously. In this paper, we formulate the problem from a multi-solution task in pose space to an end-to-end visual servo task that admits a unique optimal solution. We propose a visual servo framework Sym-Servo. Sym- Servo uses a joint learning mechanism where a deterministic policy is trained with a diffusion-based generator to encourage the shared vision encoder to learn a symmetry-aware repre- sentation, and the policy is then refined via reinforcement and self-imitation learning to produce an efficient and stable final policy. We validate Sym-Servo with simulations and real-world experiments, demonstrating its efficiency and generalization in controlling symmetric objects in a model-free manner.

Index terms

Service Robotics Domestic Robotics Assembly