← Back ICRA 2026

Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning

Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara

PDF

AI summary

Key figure (auto-extracted from paper)

PRPD cuts reinforcement learning training time by over 7x while preserving high real-world success rates by progressively bridging coarse and fine-resolution simulation gaps.

Progressive-Resolution Policy Distillation Reinforcement Learning Simulation-to-Real Transfer Particle-Based Simulation Autonomous Excavation Policy Transfer

Problem

Fine-resolution particle simulations accurately model real-world excavation but demand prohibitive computation time for reinforcement learning, whereas coarse-resolution simulations are fast but suffer from domain gaps that prevent real-world transfer.

Approach

The framework progressively transfers policies from fast, coarse-resolution simulations to fine-resolution ones through intermediate stages, using conservative policy updates to stabilize learning and bridge simulation gaps.

Key results

7-fold reduction in total learning time compared to fixed-resolution training
~90% rock excavation success rate across nine real-world environments
Novel conservative policy distillation scheme for stable cross-resolution transfer
Successful sim-to-real policy transfer without expensive real-world data collection

Why it matters

It enables scalable, time-efficient training of autonomous excavation systems, significantly reducing reliance on costly real-world sampling and high-resolution simulation compute.

Abstract

In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, re- quiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre- training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learn- ing in a fine-resolution simulation. Note to Practitioners—This paper is motivated by the issue of computation time in excavation simulation using soil particles. The behavior of real soil is highly complex, and approximating it at high resolution requires enormous computational costs. Therefore, existing soil simulators have focused on improving simulation accuracy while maintaining reduced computation time. This paper takes a different approach by focusing on the learning of control policies in excavation simulators and proposes a framework for reducing calculation time in such use cases. In this framework, a control policy is first learned in a low-resolution simulation, significantly reducing computation time. The learned policy is then transferred to a high-resolution simulation for retraining, thereby achieving an overall reduction in simulation time. Furthermore, to enable robust policy transfer across different resolutions, this paper discusses a stable policy distillation scheme and insights into resolution design. This approach enables the development of autonomous excavation systems without relying on expensive real-world data collection, improving the scalability and adaptability of autonomous ex- cavation. Simulation experiments suggest that this framework significantly reduces training time compared to conventional policy learning approaches. However, real-world validation has so far been limited to simple excavation robots. Future research will explore applications to excavators and other machinery more suitable for real-world operations. Although this paper focuses on autonomous excavation, the proposed approach can also be extended to environments where increased simulation resolution This work was supported by JST Moonshot Research and Development, Grant Number JPMJMS2032. (Corresponding author: Yuki Kadokawa.) 1 Nara Institute of Science and Technology, Nara 630-0192, Japan. 2 Kobe City College of Technology, Hyogo 651-2194, Japan. kadokawa.yuki@naist.ac.jp, h-tahara@kobe-kosen.ac.jp, takam-m@is.naist.jp Resolution Real-World Env. Simulation Env. Perf. Time Perf. Time Perf. Time Sample: Fix (Prev.): Policy Transfer (Ours): ✔ ✔ ✔ ✔ Coarse Some Middle Fine Gap ✔ ✔ ✔ ✔ Fig. 1. Overview of proposed framework: Fine-resolution simulations yield high policy performance but require long learning times, while coarse- resolution simulations allow for quick learning but perform poorly in sim- to-real transfer. Our framework starts with coarse-resolution simulations for quick learning and progressively transfers policies to fine-resolution simula- tions. Progressive resolution shift with conservative policy transfer is applied to avoid large domain gaps that could lead to policy transfer failure. This approach balances learning time with real-world performance. critically impacts computation time, such as liquid and soft object manipulation.

Index terms

Mining Robotics Robotics and Automation in Construction Reinforcement Learning