← Back ICRA 2026

START: Traversing Sparse Footholds with Terrain Reconstruction

Ruiqi Yu, Qianshi Wang, Hongyi Li, Jun Zheng, Zhicheng Wang, Jun Wu, Qiuguo Zhu

PDF

AI summary

Key figure (auto-extracted from paper)

START enables quadruped robots to safely traverse highly sparse footholds by explicitly reconstructing local terrain heightmaps from egocentric vision within a single-stage learning framework.

Legged robots Sparse footholds Terrain reconstruction Reinforcement learning Egocentric vision Sim-to-real transfer

Problem

Legged robots struggle to traverse unstructured terrains with sparse footholds because model-based controllers lack generalization and existing learning-based methods rely on noisy heightmaps or implicit terrain features that miss critical geometric cues, leading to rigid gaits and poor adaptability.

Approach

START integrates a memory-augmented Terrain Reconstruction Network with a locomotion policy in a single-stage pipeline, fusing egocentric depth images and proprioception to explicitly reconstruct local heightmaps for precise foot placement and adaptive control.

Key results

Zero-shot real-world transfer across diverse indoor and outdoor sparse terrains
Superior adaptability and precise foothold placement over implicit feature baselines
Accelerated training and reduced exploration cost via Adaptive Sampling
Robust agile locomotion on stepping stones, balance beams, and gaps using only onboard vision

Why it matters

It provides a scalable, low-cost perception-control pipeline that enables legged robots to safely navigate highly unstructured environments for real-world deployment.

Abstract

Traversing terrains with sparse footholds like legged animals presents a promising yet challenging task for quadruped robots, as it requires precise environmental perception and agile control to secure safe foot placement while maintaining dynamic stability. Model-based hierarchical controllers excel in labora- tory settings, but suffer from limited generalization and overly conservative behaviors. End-to-end learning-based approaches unlock greater flexibility and adaptability, but existing state- of-the-art methods either rely on heightmaps that introduce noise and complex, costly pipelines, or implicitly infer terrain features from egocentric depth images, often missing accurate critical geometric cues and leading to inefficient learning and rigid gaits. To overcome these limitations, we propose START, a single-stage learning framework that enables agile, stable locomotion on highly sparse and randomized footholds. START leverages only low-cost onboard vision and proprioception to accurately reconstruct local terrain heightmap, providing an explicit intermediate representation to convey essential features relevant to sparse foothold regions. This supports comprehensive environmental understanding and precise terrain assessment, reducing exploration cost and accelerating skill acquisition. Experimental results demonstrate that START achieves zero-shot transfer across diverse real-world scenarios, showcasing superior adaptability, precise foothold placement, and robust locomotion.

Index terms

Legged Robots Reinforcement Learning Deep Learning for Visual Perception