← Back SII 2026

Evaluating the Out-Of-Distribution Generalization of Robot Diffusion Policies under the DINOv2 Visual Encoder

Angel Montejo, Iñigo Iturrate

PDF

Abstract

The generalizability of visuomotor policy models is crucial for their real-world usefulness in settings such as industrial environments. This is heavily impacted by the choice of visual encoder. In this paper, we integrate the DINOv2 foundation visual encoder with Diffusion Policy by designing a spatially-aware projection head, that allows the policy to shape its visual representation while benefiting from DINOv2’s robust embeddings. We evaluate this in drastic out-of-distribution conditions. As success rate can be uninformative in these condi- tions, where failure rates are high, we present three evaluation criteria for goal-driven policies that remain informative despite task failure. Our result shows that our approach outperforms the baseline under color alterations and camera displacements. We observe promising emergent task-relevant feature tracking using the DINOv2 visual encoder for policy learning.

Index terms

Robotics Machine Learning