← Back IROS 2024

Do One Thing and Do It Well: Delegate Responsibilities in Classical Planning

Tin Lai, Philippe Morere

PDF

Abstract

We propose a novel framework and algorithm for solving classical planning problems with an implicit hierarchical solver based on the principle of delegation. This framework, the Markov Intent Process, features a collection of skills that are each specialised to perform a single task well. Skills are aware of their intended effects and are able to analyse planning goals to delegate planning to the best-suited skill. This principle dynamically creates a hierarchy of plans, in which each skill plans for sub-goals for which it is specialised. Our method performs robustly in noisy environments with non-deterministic action effects and features on-demand execution—skill policies are only evaluated when needed. Plans are only generated at the highest level, then expanded and optimised when the latest state information is available. The high-level plan retains the ini- tial planning intent and previously computed skills, effectively reducing the computation needed to adapt to environmental changes. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains, both in terms of solution length and planning time. I. I N T RO D U C T I O N Decision-making techniques enable automating many real- world tasks that would be too repetitive or even intractable to humans. Such methods require making good decisions in any situation, improved by reacting to the latest available information and revising previous intentions. This procedure is, in practice, extremely time-sensitive, as the ability to react quickly is paramount in many real-world problems where the time available to make decisions is very limited. Decision-making typically involves generating plans by searching over the space for potential solutions. Classic plan- ning methods search over plans by simulating various possible future scenarios. This quickly becomes prohibitively expen- sive in problems with larger state and/or action spaces and greatly reduces their applicability to real-time problems. Fur- thermore, plans generated this way often cannot be reused in similar situations, and unforeseen state changes often require expensive re-planning. This makes such planning techniques inefficient and slow. More recent hierarchical planning and reinforcement learning methods overcome some of these is- sues by planning at several levels of abstraction. Hierarchical planning decomposes the problem into smaller sub-problems, for which specialised skills are learned. These skills are highly reusable, easier to learn than general policies and achieve better performance. Moreover, in classical planning problems where the hierarchical structure is not given, existing planners †School of Computer Science, The University of Sydney, Aus- tralia. §Fait Corporation, Australia. 1tin.lai@faitcorp.com 2philippe.morere@sydney.edu.au typically are not aware of how to exploit the underlying struc- ture to compose reusable skills. Knowledge of each skill’s purpose is also extremely important, as it allows seamless delegating planning to more specialised skills. This lack of awareness arises from the Markov Decision Process (MDP), the base framework for most of these methods, in which the effect of actions (or skills) is unknown. Because of this, exist- ing hierarchical planning methods can be computationally in- efficient, and learning specialised skills can prove challenging. We present an implicit hierarchical planning methodology for reasoning about the effects of skills and primitive ac- tions, yielding several benefits. Planning using skill effect knowledge allows one to select the best skill for any given task directly, reducing planning time and computation by exploiting the implicit hierarchical structure. This knowledge also enables planners to reason about whether executed skills or actions were successful by comparing the expected and observed effects; this allows inferring action success condi- tions directly from interactions. Furthermore, the presented method plans at the highest level only and expands plans into more detailed plans on-demand. Thus, the latest state information can be taken into account, making plans reactive to noise and adversarial actions. Also, plans do not need to be re-computed after unforeseen state changes occur, and planning computation is expended only when a higher detail level is needed. This makes planning inexpensive and fast. Our contributions are the following. We present a new sequential decision-making framework, the Markov Intent Process (MIP), incorporating action and skill effects at its core. This framework advances solving classical planning by structuring the exploiting of task hierarchies as skills and plans. MIP can operate in noisy environments where states are changed by exogenous forces, which implies action effects are non-deterministic. We formulate the notion of optimal plan in MIP and propose to convert the sequential decision- making problem into a collection of non-sequential decision- making problems, which are easier to solve. We present a hierarchical intent-aware on-demand planning algorithm— called PolicyDelegate—based on the MIP framework. Finally, we experimentally show PolicyDelegate is resilient to noise and outperforms other classic planning and reinforcement learning methods, both in terms of planning time and solution length, on a variety of domains. II. R E L AT E D W O R K S Planning by reasoning on the effects and conditions of actions has received much attention over the years. Clas- sic planners like STRIPS [1] are based on this principle, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 14-18, 2024. Abu Dhabi, UAE 979-8-3503-7769-9/24/$31.00 ©2024 IEEE 9713

Index terms

Learning Categories and Concepts Autonomous Agents Task Planning