← Back ICRA 2026

MemClaw-RAG: Memory-Driven Navigation and Adaptive Locomotion for Wheeled-Legged Robots in Dynamic Environments

Mingyi Li, Shubo Zhang, Chunle Gao, Kaixin Yang, Ying Li

PDF

AI summary

Key figure (auto-extracted from paper)

MemClaw-RAG enables wheeled-legged robots to navigate complex, dynamic indoor environments with high success rates by combining structured spatial memory with adaptive terrain-aware locomotion.

Object-Goal Navigation Memory Graph Retrieval Wheeled-Legged Robots Adaptive Locomotion Embodied AI Dynamic Environments

Problem

Current object-goal navigation systems lack structured long-term memory and struggle with task interruptions, spatial drift in GPS-denied settings, and complex terrain like stairs.

Approach

The framework integrates a Memory Graph Retrieval module for spatial-semantic grounding, a dual-process SelfClaw cognitive system for task scheduling and memory retention, and a reinforcement learning-based policy that dynamically switches between wheeled and legged locomotion modes.

Key results

Achieves 0.81 success rate and 0.51 SPL on Gibson and HM3D benchmarks
Reaches 0.76 success rate and 0.48 SPL on multi-layer MP3D environments
Maintains 55ms per-step inference latency on an NVIDIA Jetson Orin
Demonstrates stable real-world navigation and stair climbing on a Unitree wheeled-legged robot

Why it matters

Provides a robust, deployable navigation framework for service and domestic robots operating in unstructured, multi-level indoor spaces.

Abstract

Object-Goal Navigation in dynamic environments remains challenging as existing approaches rely primarily on re- active mapping that lacks the capacity to retain historical expe- rience or establish structured memory associations. To address this, we introduce MemClaw-RAG, an embodied multimodal framework. MemClaw-RAG features three key innovations: (1) a Memory Graph Retrieval (MGR) module that integrates mul- timodal knowledge graphs for structured semantic association; (2) a SelfClaw cognitive module that orchestrates skill task scheduling and enhances historical memory retention; and (3) a Hybrid Adaptive Locomotion Policy (HALP) based on deep reinforcement learning that synergizes wheel-driven efficiency with legged dexterity. On Habitat benchmarks, MemClaw-RAG achieves an SR of 0.81 and an SPL of 0.51 on the Gibson and HM3D datasets. Notably, in the more challenging multi-layer environments of MP3D, our method achieves an SR of 0.76 and an SPL of 0.48, outperforming several representative memory- based and end-to-end approaches. Real-world deployment on a Unitree wheeled-legged robot confirms an average per-step inference latency of 55ms on a Jetson Orin, demonstrating stable navigation behavior during real-world deployment in dynamic environments.

Index terms

Vision-Based Navigation Semantic Scene Understanding Wheeled Robots