Parallel Heuristic Search As Inference for Actor-Critic Reinforcement Learning Models
Hanlan Yang, Itamar Mishani, Luca Pivetti, Zachary Kingston, Maxim Likhachev
AI summary
Problem
Standard RL deployment relies on simplistic one-step policy rollouts that lack multi-step reasoning, while traditional search methods struggle with continuous action spaces and complex dynamics without hand-crafted heuristics.
Approach
PACHS leverages the actor network to generate candidate actions and the critic network to provide learned cost-to-go heuristics within a parallel best-first search, decoupling edge evaluation from state expansion for efficiency.
Key results
- Novel PACHS algorithm integrating actor-critic models with best-first search
- Multi-layered parallelization strategy achieving significant computational efficiency
- Enhanced generalization and robustness of RL policies in complex robotic environments
- Successful deployment in collision-free motion planning and contact-rich pushing tasks
Why it matters
Bridges the gap between learned policies and systematic planning, offering a practical, retraining-free solution for real-time robotic control and manipulation.
Abstract
Actor-critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While con- siderable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose PACHS (Parallel Actor- Critic Heuristic Search), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search—actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic ma- nipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.