Grounding Language Models with Semantic Digital Twins for Robotic Planning
Mehreen Naeem, Andrew Melnik, Michael Beetz
AI summary
Problem
LLM-based robotic planners often hallucinate, lack grounding in physical constraints, and struggle to adapt to execution failures in dynamic environments.
Approach
The framework uses a Semantic Digital Twin to provide real-time object affordances and interaction rules, grounding LLM-generated action triplets and enabling context-aware failure resolution and iterative replanning.
Key results
- 100% task success rate on ALFRED household tasks with SDT integration
- Drastic reduction in failure and replanning iterations compared to baseline
- Context-aware failure resolver corrects object selection and affordance errors
- Training-free, real-time adaptation to dynamic environmental changes
Why it matters
Enables reliable, adaptive robotic execution in complex environments without relying on external training or static scene graphs.
Abstract
We introduce a novel framework that integrates Semantic Digital Twins (SDTs) with Large Language Models (LLMs) to enable adaptive and goal-driven robotic task execu- tion in dynamic environments. The system decomposes natural language instructions into structured action triplets, which are grounded in contextual environmental data provided by the SDT. This semantic grounding allows the robot to interpret object affordances and interaction rules, enabling action planning and real-time adaptability. In case of execution failures, the LLM utilizes error feedback and SDT insights to generate recovery strategies and iteratively revise the action plan. We evaluate our approach using tasks from the ALFRED benchmark, demonstrat- ing robust performance across various household scenarios. The proposed framework effectively combines high-level reasoning with semantic environment understanding, achieving reliable task completion in the face of uncertainty and failure.