Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning
Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara
AI summary
Problem
Fine-resolution particle simulations accurately model real-world excavation but demand prohibitive computation time for reinforcement learning, whereas coarse-resolution simulations are fast but suffer from domain gaps that prevent real-world transfer.
Approach
The framework progressively transfers policies from fast, coarse-resolution simulations to fine-resolution ones through intermediate stages, using conservative policy updates to stabilize learning and bridge simulation gaps.
Key results
- 7-fold reduction in total learning time compared to fixed-resolution training
- ~90% rock excavation success rate across nine real-world environments
- Novel conservative policy distillation scheme for stable cross-resolution transfer
- Successful sim-to-real policy transfer without expensive real-world data collection
Why it matters
It enables scalable, time-efficient training of autonomous excavation systems, significantly reducing reliance on costly real-world sampling and high-resolution simulation compute.
Abstract
In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, re- quiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre- training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learn- ing in a fine-resolution simulation. Note to Practitioners—This paper is motivated by the issue of computation time in excavation simulation using soil particles. The behavior of real soil is highly complex, and approximating it at high resolution requires enormous computational costs. Therefore, existing soil simulators have focused on improving simulation accuracy while maintaining reduced computation time. This paper takes a different approach by focusing on the learning of control policies in excavation simulators and proposes a framework for reducing calculation time in such use cases. In this framework, a control policy is first learned in a low-resolution simulation, significantly reducing computation time. The learned policy is then transferred to a high-resolution simulation for retraining, thereby achieving an overall reduction in simulation time. Furthermore, to enable robust policy transfer across different resolutions, this paper discusses a stable policy distillation scheme and insights into resolution design. This approach enables the development of autonomous excavation systems without relying on expensive real-world data collection, improving the scalability and adaptability of autonomous ex- cavation. Simulation experiments suggest that this framework significantly reduces training time compared to conventional policy learning approaches. However, real-world validation has so far been limited to simple excavation robots. Future research will explore applications to excavators and other machinery more suitable for real-world operations. Although this paper focuses on autonomous excavation, the proposed approach can also be extended to environments where increased simulation resolution This work was supported by JST Moonshot Research and Development, Grant Number JPMJMS2032. (Corresponding author: Yuki Kadokawa.) 1 Nara Institute of Science and Technology, Nara 630-0192, Japan. 2 Kobe City College of Technology, Hyogo 651-2194, Japan. kadokawa.yuki@naist.ac.jp, h-tahara@kobe-kosen.ac.jp, takam-m@is.naist.jp Resolution Real-World Env. Simulation Env. Perf. Time Perf. Time Perf. Time Sample: Fix (Prev.): Policy Transfer (Ours): ✔ ✔ ✔ ✔ Coarse Some Middle Fine Gap ✔ ✔ ✔ ✔ Fig. 1. Overview of proposed framework: Fine-resolution simulations yield high policy performance but require long learning times, while coarse- resolution simulations allow for quick learning but perform poorly in sim- to-real transfer. Our framework starts with coarse-resolution simulations for quick learning and progressively transfers policies to fine-resolution simula- tions. Progressive resolution shift with conservative policy transfer is applied to avoid large domain gaps that could lead to policy transfer failure. This approach balances learning time with real-world performance. critically impacts computation time, such as liquid and soft object manipulation.