← Back ICRA 2026

IDfRA: Self-Verification for Iterative Design in Robotic Assembly

Nishka Khendry, Christos Margadji, Sebastian William Pattinson

PDF

AI summary

Key figure (auto-extracted from paper)

An iterative LLM-VLM framework with self-verification significantly improves semantic accuracy and physical feasibility in robotic assembly design compared to static baselines.

Robotic assembly Design for Robotic Assembly Large Language Models Vision-Language Models Self-Verification Iterative Planning

Problem

Current robotic assembly design relies on manual planning or static simulators that assume accurate world models, limiting adaptability in dynamic environments. Existing LLM-based approaches lack iterative feedback mechanisms to correct physical feasibility and semantic alignment errors.

Approach

IDfRA employs a closed-loop system where an LLM generates assembly plans, a simulated robot executes them, and a vision-language model evaluates the outcome to provide feedback for iterative replanning, discovering physical constraints online.

Key results

73.3% top-1 semantic recognizability accuracy, surpassing baselines
86.9% overall construction success rate with robust physical feasibility
Demonstrates iterative design improvement through self-verification feedback
Eliminates need for accurate a priori world models by discovering constraints online

Why it matters

Enables scalable, adaptive robotic assembly in unstructured manufacturing by automating design refinement through AI-driven self-verification, reducing reliance on manual engineering and precise simulation models.

Abstract

Design for Robotic Assembly (DfRA) remains largely dependent on manual planning and heuristic simula- tion, limiting scalability and robustness in complex industrial settings. Although large language models (LLMs) show promise for semantic reasoning and task planning, most approaches remain tightly coupled to pre-built simulators that assume an accurate world model. We introduce Iterative Design for Robotic Assembly (IDfRA), a closed-loop framework that com- bines an LLM for plan generation with a vision–language model (VLM) for execution assessment. Given a target structure and a partial environmental signature, the LLM proposes an assembly plan, the robot executes it once at test time, and the VLM evaluates the resulting state to provide feedback for replanning. Through this iterative planning–execution–verification loop, the system progressively improves semantic fidelity and physical feasibility. Crucially, IDfRA does not require an accurate a priori world model before deployment. Instead, physical constraints are discovered online through interaction, enabling adaptation to under-specified environments. Empirical evalua- tion demonstrates that IDfRA attains 73.3% top-1 accuracy in semantic recognisability, surpassing the baseline on this metric. Moreover, the resulting assembly plans exhibit robust physical feasibility, achieving an overall 86.9% construction success rate, with design quality improving across iterations, albeit not always monotonically. Pairwise human evaluation further corroborates the advantages of IDfRA relative to alter- native approaches. By integrating self-verification with context- aware adaptation, the framework evidences strong potential for deployment in unstructured manufacturing scenarios.

Index terms

Intelligent and Flexible Manufacturing Assembly Planning Scheduling and Coordination