TADPO: Reinforcement Learning Goes Off-Road
Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider
AI summary
Problem
Standard reinforcement learning struggles with off-road autonomous driving due to long-horizon planning, low-signal rewards, complex terrain dynamics, and inefficient exploration.
Approach
TADPO extends Proximal Policy Optimization (PPO) to concurrently learn from fixed expert demonstrations and on-policy student rollouts, using a clipped teacher-student distillation loss to guide exploration while maintaining independent value estimation.
Key results
- Novel TADPO algorithm extending PPO with teacher action distillation
- Vision-based end-to-end RL system navigating extreme slopes and obstacle-rich terrain in simulation
- First zero-shot sim-to-real deployment of RL policies on a full-scale off-road vehicle
- High-speed, long-horizon autonomous navigation without fine-tuning or dense mapping
Why it matters
Enables reliable autonomous navigation in unstructured, unmapped environments where traditional mapping and modeling fail, advancing practical off-road robotics and exploration.
Abstract
Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on- policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high- speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in simulation and, importantly, zero-shot sim-to-real transfer on a full-scale off-road vehicle. To our knowledge, this work represents the first deployment of RL-based policies on a full- scale off-road platform. Source code is available at this link and video at this link.