← Back ICRA 2026

CLOVER: Context-Aware Long-Term Object Viewpoint and Environment Invariant Representation Learning

Amanda Adkins, Dongmyeong Lee, Joydeep Biswas

PDF

AI summary

Key figure (auto-extracted from paper)

CLOVER enables robust, long-term object re-identification in dynamic outdoor environments by leveraging contextual image patches and supervised contrastive learning without requiring foreground segmentation.

Object Re-identification Representation Learning Robot Perception Context-Aware Vision Long-term Mapping CODa Re-ID

Problem

Existing object re-identification methods struggle with viewpoint changes, lighting and weather variations, and often require foreground segmentation or focus only on specific classes or indoor scenes. There is a lack of real-world datasets and scalable methods for general static object re-identification across diverse environmental conditions.

Approach

The authors introduce CLOVER, a representation learning method that uses context-aware image patches and supervised contrastive loss on a frozen foundation model backbone to learn viewpoint- and environment-invariant features, alongside MapCLOVER for scalable summarization and matching of object descriptors.

Key results

CODa Re-ID dataset with over 1 million observations of 557 objects across 8 classes under diverse conditions
CLOVER outperforms existing methods in static object re-identification under varying lighting and viewpoints
MapCLOVER enables scalable long-term object mapping and robust matching via descriptor clustering
Strong generalization to unseen object instances and entirely new semantic classes

Why it matters

It provides a scalable, robust solution for long-term object-level environmental understanding, crucial for reliable mobile service robots operating in dynamic real-world settings.

Abstract

Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen in- stances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.

Index terms

Deep Learning for Visual Perception Data Sets for Robotic Vision Semantic Scene Understanding