MemClaw-RAG: Memory-Driven Navigation and Adaptive Locomotion for Wheeled-Legged Robots in Dynamic Environments
Mingyi Li, Shubo Zhang, Chunle Gao, Kaixin Yang, Ying Li
AI summary
Problem
Current object-goal navigation systems lack structured long-term memory and struggle with task interruptions, spatial drift in GPS-denied settings, and complex terrain like stairs.
Approach
The framework integrates a Memory Graph Retrieval module for spatial-semantic grounding, a dual-process SelfClaw cognitive system for task scheduling and memory retention, and a reinforcement learning-based policy that dynamically switches between wheeled and legged locomotion modes.
Key results
- Achieves 0.81 success rate and 0.51 SPL on Gibson and HM3D benchmarks
- Reaches 0.76 success rate and 0.48 SPL on multi-layer MP3D environments
- Maintains 55ms per-step inference latency on an NVIDIA Jetson Orin
- Demonstrates stable real-world navigation and stair climbing on a Unitree wheeled-legged robot
Why it matters
Provides a robust, deployable navigation framework for service and domestic robots operating in unstructured, multi-level indoor spaces.
Abstract
Object-Goal Navigation in dynamic environments remains challenging as existing approaches rely primarily on re- active mapping that lacks the capacity to retain historical expe- rience or establish structured memory associations. To address this, we introduce MemClaw-RAG, an embodied multimodal framework. MemClaw-RAG features three key innovations: (1) a Memory Graph Retrieval (MGR) module that integrates mul- timodal knowledge graphs for structured semantic association; (2) a SelfClaw cognitive module that orchestrates skill task scheduling and enhances historical memory retention; and (3) a Hybrid Adaptive Locomotion Policy (HALP) based on deep reinforcement learning that synergizes wheel-driven efficiency with legged dexterity. On Habitat benchmarks, MemClaw-RAG achieves an SR of 0.81 and an SPL of 0.51 on the Gibson and HM3D datasets. Notably, in the more challenging multi-layer environments of MP3D, our method achieves an SR of 0.76 and an SPL of 0.48, outperforming several representative memory- based and end-to-end approaches. Real-world deployment on a Unitree wheeled-legged robot confirms an average per-step inference latency of 55ms on a Jetson Orin, demonstrating stable navigation behavior during real-world deployment in dynamic environments.