← Back ICRA 2026

Enhancing Classical Motion Planners Using RL with Safety Guarantees

Elias Goldsztejn, Ronen Brafman

PDF

AI summary

Key figure (auto-extracted from paper)

Regularizing reinforcement learning with a classical planner yields faster, safer navigation while provably guaranteeing the policy stays within a user-defined safety region.

Reinforcement Learning Motion Planning Safety Guarantees Trust Region Classical Planners Robotics

Problem

Classical motion planners are safe but require manual tuning and perform poorly in complex scenarios, whereas pure reinforcement learning lacks safety guarantees and interpretability.

Approach

The method trains a reinforcement learning policy using a classical planner as an online expert to regularize action selection, combined with a capped weight and gradient penalty to enforce a provable trust region.

Key results

Reduces traversal time by 8% vs DWA and 43% vs TEB
Lowers proximity risk by 24% vs DWA and 17% vs TEB
Provably constrains policy deviation within a user-defined trust region
Matches or surpasses learning-based baselines in simulation and real-world tests

Why it matters

Enables safe, high-performance RL deployment in robotics by inheriting classical planner reliability without manual tuning.

Abstract

Classical algorithms for autonomous navigation, while well-understood and safe, require manual parameter tuning by experts to perform well. APPL [1] and similar methods use machine learning to dynamically adjust planner parameters during deployment. This approach maintains the safety of classical systems but remains constrained by the under- lying algorithm. Instead of parameter tuning, we suggest using classical planners to regulate action selection of a reinforcement learning (RL) algorithm. The resulting policy is provably similar to the well-understood classical algorithm, performs better than both a well-tuned classical planner and an unregularized RL- based policy, and can be shown to respect a user-controlled trust region even during training. In experiments, our method reduces traversal time by 8% (vs. DWA [2]) and 43% (vs. TEB [3]), and lowers proximity risk by 24% and 17%, respectively, while matching or surpassing learning-based baselines and aligning more closely with user preferences. INTRODUCTION Classical local navigation algorithms like the Dynamic Window Approach (DWA) [2], and Timed Elastic Bands (TEB) [3] are widely used in robotics because they generate safe and well-understood behavior, and navigate well in most cases. However, they can be suboptimal in some situations, such as moving too slowly in open areas or struggling in crowded spaces. The traditional solution is to manually retune their parameters, which requires expert knowledge and may degrade performance in previously successful scenarios. Two general approaches address these shortcomings. The first forgoes classical algorithms and uses deep reinforcement learning (RL) [4]–[9] or imitation learning (IL) to train a path planner [10]–[12]. RL often performs well, but returns a black box controller, raising safety and explainability issues; and IL requires expert demonstrations and suffers from distribution drift, implying potentially unsafe behavior in previously unseen situations. A second approach is to use learning to improve classical solvers. Adaptive Planner Parameter Learning (APPL) [1] enhances existing navigation systems by dynamically ad- justing their parameters during deployment using various ML techniques. These techniques include learning contextual parameters from tele-operated demonstrations [13], incorpo- rating corrective interventions from non-expert users [14], learning from binary or scalar evaluative feedback [15], and using RL in simulations [16]. DWA-RL [17] enhances a *This work was supported in part by The Israel Science Foundation (grant No. 573/25), the Israel Ministry of Science and Technology (grant No. 08602), Ben-Gurion University of the Negev through the Agricultural, Biological, and Cognitive Robotics Initiative, and the Lynn and William Frankel Center for Computer Science. 1Elias Goldsztejn and 1Ronen I. Brafman are with Faculty of Computer and Information Science at Ben Gurion University. eliasgol@post.bgu.ac.il, brafman@bgu.ac.il classical algorithm by learning to select among the actions it proposes. Both methods boost the performance of local classical algorithms while maintaining the safety and trans- parency they offer. However, both methods are too tightly constrained by the classical algorithm and its greedy nature, whereas RL looks ahead beyond the next action. This paper seeks to combine the advantages of both end- to-end and classical approaches, specifically by regularizing an RL algorithm using a classical planner. Regularization is a standard RL technique, originally used to encourage exploration and robustness [18]. More recently, [19] used regularization to combine RL with IL. They trained a car controller in simulation [20] with an RL objective that penal- izes deviations from expert demonstrations (see Fig. 2). This technique learns good behaviors faster and, more importantly, helps maintain trust in the generated policy. We strongly build on this idea, demonstrating its effectiveness in the domain of robot motion planning. However, instead of expert demonstrations, we rely on a classical algorithm. Our method has several advantages. First, our path planner typically behaves similarly to the classical algorithm, making it safe and understandable. In fact, it provably stays within a well-defined trust-region around the classical algorithm’s policy during deployment and often, in practice, during training. Second, our extensive empirical evaluation, both on a sophisticated and challenging simulation platform and on real robots, using objective and subjective measures, and with two different classical algorithms, shows that our method improves on both RL-based parameter tuning and vanilla RL. Third, our method does not require expert demonstrations and is not limited by pre-generated demonstrations, as we can automatically query the classical algorithm for its choice in any state. Finally, we also inherit the advantage of faster training times demonstrated by earlier methods. In summary, our main contributions are: (1) The first application, to our knowledge, of RL regularized by a clas- sical algorithm in motion planning. (2) Provably guaranteed trust region. (3) Demonstrating the strength of this method, both in retaining similarity to the classical planner and in improved objective and subjective performance measures, via a comprehensive empirical evaluation within a challenging simulation environment and in the real world. For code and videos, refer to the repository: https://github.com/ecmpurlsg/Enhancing-Classical-Motion- Planners-Using-RL-with-Safety-Guarantees BACKGROUND Classical Mobile Planning: In mobile robot planning, there are two main components: global and local planning. 2026 IEEE International Conference on Robotics and Automation (ICRA 2026) June 1-5, 2026. Vienna, Austria 979-8-3315-8160-2/26/$31.00 ©2026 IEEE 5110

Index terms

Reinforcement Learning Motion and Path Planning Imitation Learning