← Back ICRA 2026

Parallel Heuristic Search As Inference for Actor-Critic Reinforcement Learning Models

Hanlan Yang, Itamar Mishani, Luca Pivetti, Zachary Kingston, Maxim Likhachev

PDF

AI summary

Key figure (auto-extracted from paper)

PACHS enables robust, multi-step robotic planning by integrating off-the-shelf actor-critic RL models into a parallel best-first search framework without requiring retraining.

Actor-critic reinforcement learning Heuristic search Parallel planning Robotic manipulation Inference-time search Soft Actor-Critic

Problem

Standard RL deployment relies on simplistic one-step policy rollouts that lack multi-step reasoning, while traditional search methods struggle with continuous action spaces and complex dynamics without hand-crafted heuristics.

Approach

PACHS leverages the actor network to generate candidate actions and the critic network to provide learned cost-to-go heuristics within a parallel best-first search, decoupling edge evaluation from state expansion for efficiency.

Key results

Novel PACHS algorithm integrating actor-critic models with best-first search
Multi-layered parallelization strategy achieving significant computational efficiency
Enhanced generalization and robustness of RL policies in complex robotic environments
Successful deployment in collision-free motion planning and contact-rich pushing tasks

Why it matters

Bridges the gap between learned policies and systematic planning, offering a practical, retraining-free solution for real-time robotic control and manipulation.

Abstract

Actor-critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While con- siderable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose PACHS (Parallel Actor- Critic Heuristic Search), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search—actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic ma- nipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.

Index terms

Motion and Path Planning Reinforcement Learning