← Back ICRA 2026

SURF-Loco: Mastering Complex Industrial Terrains with 3D Surfel-Based Reinforcement Learning for Legged Robots

Bailin He, Xiting Zhao, Qiao Sun, Xiaoyi Hu, haojie Liu, Jiangwei Zhong, Wenqiang Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

SURF-Loco enables robust, zero-shot sim-to-real legged locomotion on complex industrial terrains by learning directly from a compressed 3D surfel representation.

Legged locomotion 3D surfels Mixture-of-Experts Reinforcement learning Sim-to-real transfer Industrial robotics

Problem

Conventional 2D/2.5D perception systems fail to capture intricate 3D geometry like overhangs and grated floors, causing legged robots to struggle with safe navigation in cluttered industrial environments.

Approach

The framework encodes a pre-scanned 3D surfel map into a compact latent context using a VAE, which guides a Mixture-of-Experts reinforcement learning policy to dynamically select terrain-specific control strategies.

Key results

First surfel-based perception and control framework for legged locomotion
VAE-compressed geometric context enables dynamic expert selection
Robust zero-shot sim-to-real transfer on a hexapod robot
Successful traversal of multi-level industrial obstacles like grated floors

Why it matters

Provides a scalable perception-action pipeline for autonomous legged robots operating in unstructured industrial settings where traditional platforms and 2.5D vision fail.

Abstract

Legged robots offer significant potential for nav- igating complex industrial terrains, but their capabilities are often constrained by perception systems struggling to interpret intricate 3D geometry. Conventional 2D/2.5D representations like depth or elevation maps fail to capture complex 3D geometry, leading to unsafe locomotion. This paper presents SURF-Loco, a novel framework that enables robust legged locomotion by learning directly from a 3D surfel-based model. Our approach uses surfels to create an omnidirectional rep- resentation that explicitly encodes the geometric properties necessary for stable locomotion. We integrate this structured 3D representation into an end-to-end Mixture-of-Experts (MoE) reinforcement learning policy. A variational autoencoder (VAE) distills the complex 3D surroundings into a compact latent context. This geometric context enables a gating network to dynamically select expert sub-policies for agile, context-aware actions. We validate our method on the Lenovo Daystar IS hexapod robot, achieving robust zero-shot sim-to-real transfer on a variety of challenging industrial obstacles.

Index terms

Legged Robots Reinforcement Learning Sensor-based Control