Closing the Communication Loop for Robotic Failures: Multi-Turn, Behavior-Tree-Grounded Explanations with Large Language Models
Parag Khanna, Haoyun Zhou, Elmira Yadollahi, Iolanda Leite, Claes Christian Smith
AI summary
Problem
Template-based robot failure explanations lack contextual flexibility, cannot handle follow-up questions, and ignore interaction history, which hinders user recovery and erodes trust during collaborative tasks.
Approach
The authors built a failure communication module that feeds a Behavior Tree’s structured task logic and persistent interaction history into an LLM to generate tailored, multi-turn explanations and verify user recovery actions.
Key results
- Implemented a BT-grounded LLM module for multi-turn failure explanations
- Improved resolution rates for challenging failures in a 33-participant user study
- Reduced resolution times for simpler failures across varying explanation detail levels
- Demonstrated scalable, history-aware communication that cuts redundancy for repeated failures
Why it matters
It offers a scalable, flexible alternative to rigid templates for human-robot collaboration, directly improving recovery efficiency, user trust, and communication in real-world robotic deployments.
Abstract
Robot failures during collaborative tasks can frus- trate users and reduce trust. To address this, we developed a failure communication module that combines large language models (LLMs) with Behavior Trees (BTs) to generate interac- tive, context-aware explanations for task failures. The module supports three key processes: (1) initial (high/medium/low) lev- eled explanations, (2) interactive clarifications for user follow-up questions, and (3) explicit verification of user actions to close the recovery loop. By leveraging the BT structure and persistent interaction history, it generates responsive, multi-turn explana- tions and reduces redundancy for repeated failures. We im- plemented and evaluated this module in real-time robotic pick- and-place tasks and conducted a user study with 33 participants across three high/medium/low explanation conditions. The user study showed that the module improved resolution rates for challenging failures and reduced resolution times for simpler failures, demonstrating the effectiveness of LLM-powered, BT- grounded explanations in human-robot collaboration (HRC).