CLOVER: Context-Aware Long-Term Object Viewpoint and Environment Invariant Representation Learning
Amanda Adkins, Dongmyeong Lee, Joydeep Biswas
AI summary
Problem
Existing object re-identification methods struggle with viewpoint changes, lighting and weather variations, and often require foreground segmentation or focus only on specific classes or indoor scenes. There is a lack of real-world datasets and scalable methods for general static object re-identification across diverse environmental conditions.
Approach
The authors introduce CLOVER, a representation learning method that uses context-aware image patches and supervised contrastive loss on a frozen foundation model backbone to learn viewpoint- and environment-invariant features, alongside MapCLOVER for scalable summarization and matching of object descriptors.
Key results
- CODa Re-ID dataset with over 1 million observations of 557 objects across 8 classes under diverse conditions
- CLOVER outperforms existing methods in static object re-identification under varying lighting and viewpoints
- MapCLOVER enables scalable long-term object mapping and robust matching via descriptor clustering
- Strong generalization to unseen object instances and entirely new semantic classes
Why it matters
It provides a scalable, robust solution for long-term object-level environmental understanding, crucial for reliable mobile service robots operating in dynamic real-world settings.
Abstract
Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen in- stances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.