Research Analyzer
← Back ICRA 2026

CMAR-search: Commonsense and Memory Augmented Reasoning for Object Search in Dynamic Interactive Environments

Kaiyao Liao, Qingfeng Li, Xinlei Zhang, Chen Chen, Qing Sun, Jianwei Niu

PDF

AI summary

Key figure (auto-extracted from paper)
CMAR-search significantly outperforms state-of-the-art baselines in success rate and search efficiency for dynamic interactive object search by leveraging commonsense and memory to dynamically adapt to environmental changes.
Dynamic object search Commonsense reasoning Functional scene graph Hierarchical planning Embodied AI Memory-augmented reasoning

Problem

Current scene representations lack functional area understanding and adaptability to dynamic changes, forcing robots into inefficient exhaustive searches in large-scale, interactive environments.

Approach

The framework constructs a Functional 3D Scene Graph using commonsense and memory to enable hierarchical planning at both area and container levels, while continuously integrating real-time perception and historical memory to adapt to object relocations.

Key results

  • Introduces CMAR-search framework with Commonsense and Memory Augmented Reasoning
  • Constructs a Functional 3D Scene Graph for hierarchical planning
  • Achieves 0.957 container classification accuracy and 0.911 room segmentation purity
  • Significantly improves success rate and search efficiency in simulation and real-world tests

Why it matters

Enables robots to efficiently and robustly locate objects in complex, changing human environments, advancing practical embodied AI for real-world deployment.

Abstract

Dynamic interactive object search in large-scale human environments presents substantial challenges for existing methods. Current scene representations like 3D Scene Graphs (3DSG) only provide coarse-grained spatial segmentation and cannot identify functional areas such as storage or leisure areas. Without functional area understanding, existing methods are constrained to exhaustive sequential exploration at large scales, resulting in inefficient search behaviors—particularly in open- layout environments with numerous interactive objects such as drawers and cabinets. Moreover, these methods lack adaptabil- ity to environmental dynamics such as object relocations. To address these limitations, this paper proposes CMAR-search, a novel framework built upon Commonsense and Memory Aug- mented Reasoning (CMAR). Our approach leverages common- sense about area functionalities and aggregates environmental memory to construct a Functional 3D Scene Graph (F3DSG), which organizes the environment into functional areas with their associated containers. Through this structured represen- tation, CMAR enables hierarchical action planning at both macro-area and micro-container levels, empowering the system to efficiently identify and inspect semantically relevant areas for effective object search. Notably, CMAR continuously integrates real-time perception, accumulated memory, and commonsense to dynamically relocalize objects in changing environments. Extensive experiments in simulation and real-world settings demonstrate that CMAR-search significantly surpasses state- of-the-art baselines in both success rate and search efficiency for object search in dynamic interactive environments.

Index terms

Semantic Scene Understanding Task and Motion Planning Mobile Manipulation

Related papers