Research Analyzer
← Back ICRA 2026

MemClaw-RAG: Memory-Driven Navigation and Adaptive Locomotion for Wheeled-Legged Robots in Dynamic Environments

Mingyi Li, Shubo Zhang, Chunle Gao, Kaixin Yang, Ying Li

PDF

AI summary

Key figure (auto-extracted from paper)
MemClaw-RAG enables wheeled-legged robots to navigate complex, dynamic indoor environments with high success rates by combining structured spatial memory with adaptive terrain-aware locomotion.
Object-Goal Navigation Memory Graph Retrieval Wheeled-Legged Robots Adaptive Locomotion Embodied AI Dynamic Environments

Problem

Current object-goal navigation systems lack structured long-term memory and struggle with task interruptions, spatial drift in GPS-denied settings, and complex terrain like stairs.

Approach

The framework integrates a Memory Graph Retrieval module for spatial-semantic grounding, a dual-process SelfClaw cognitive system for task scheduling and memory retention, and a reinforcement learning-based policy that dynamically switches between wheeled and legged locomotion modes.

Key results

  • Achieves 0.81 success rate and 0.51 SPL on Gibson and HM3D benchmarks
  • Reaches 0.76 success rate and 0.48 SPL on multi-layer MP3D environments
  • Maintains 55ms per-step inference latency on an NVIDIA Jetson Orin
  • Demonstrates stable real-world navigation and stair climbing on a Unitree wheeled-legged robot

Why it matters

Provides a robust, deployable navigation framework for service and domestic robots operating in unstructured, multi-level indoor spaces.

Abstract

Object-Goal Navigation in dynamic environments remains challenging as existing approaches rely primarily on re- active mapping that lacks the capacity to retain historical expe- rience or establish structured memory associations. To address this, we introduce MemClaw-RAG, an embodied multimodal framework. MemClaw-RAG features three key innovations: (1) a Memory Graph Retrieval (MGR) module that integrates mul- timodal knowledge graphs for structured semantic association; (2) a SelfClaw cognitive module that orchestrates skill task scheduling and enhances historical memory retention; and (3) a Hybrid Adaptive Locomotion Policy (HALP) based on deep reinforcement learning that synergizes wheel-driven efficiency with legged dexterity. On Habitat benchmarks, MemClaw-RAG achieves an SR of 0.81 and an SPL of 0.51 on the Gibson and HM3D datasets. Notably, in the more challenging multi-layer environments of MP3D, our method achieves an SR of 0.76 and an SPL of 0.48, outperforming several representative memory- based and end-to-end approaches. Real-world deployment on a Unitree wheeled-legged robot confirms an average per-step inference latency of 55ms on a Jetson Orin, demonstrating stable navigation behavior during real-world deployment in dynamic environments.

Index terms

Vision-Based Navigation Semantic Scene Understanding Wheeled Robots

Related papers