Evaluating the Out-Of-Distribution Generalization of Robot Diffusion Policies under the DINOv2 Visual Encoder
Angel Montejo, Iñigo Iturrate
Abstract
The generalizability of visuomotor policy models is crucial for their real-world usefulness in settings such as industrial environments. This is heavily impacted by the choice of visual encoder. In this paper, we integrate the DINOv2 foundation visual encoder with Diffusion Policy by designing a spatially-aware projection head, that allows the policy to shape its visual representation while benefiting from DINOv2’s robust embeddings. We evaluate this in drastic out-of-distribution conditions. As success rate can be uninformative in these condi- tions, where failure rates are high, we present three evaluation criteria for goal-driven policies that remain informative despite task failure. Our result shows that our approach outperforms the baseline under color alterations and camera displacements. We observe promising emergent task-relevant feature tracking using the DINOv2 visual encoder for policy learning.