← Back ICRA 2026

LLM Trainer: Automated Robotic Data Generating Via Demonstration Augmentation Using LLMs

Abraham George, Amir Barati Farimani

PDF

AI summary

Key figure (auto-extracted from paper)

Automating a single human demonstration with LLMs and Thompson sampling generates a large, high-quality robot dataset that outperforms expert baselines.

LLMs Robot Learning Data Augmentation Imitation Learning Thompson Sampling Autonomous Data Generation

Problem

Robot learning demands large datasets of human demonstrations, but manual collection and annotation are costly and time-consuming. Existing automated augmentation methods rely on hard-coded rules or manual labeling, limiting scalability and generalization.

Approach

The system uses an LLM to annotate a single demonstration by extracting keyframes and object relations, then adaptively warps those keyframes to new scenes. A multi-armed bandit optimizer selects the best annotations to maximize successful data generation.

Key results

Fully automated data generation from a single unannotated demonstration
Thompson sampling optimization boosts generation success rate by 2–3× over expert baselines
Optimized LLM feed-forward policy matches or exceeds trained imitation learning agents
Hardware validation on a Franka Emika Panda robot with an ensembled LLM-IL controller

Why it matters

Democratizes scalable robot learning by eliminating manual data collection and annotation bottlenecks for imitation learning.

Abstract

We present LLM Trainer, a fully automated pipeline that leverages the world knowledge of Large Language Models (LLMs) to transform a small number of human demon- strations (as few as one) into a large robot dataset for imitation learning. Our approach decomposes demonstration generation into two steps: (1) offline demonstration annotation that extracts keyframes, salient objects, and pose–object relations; and (2) online keypose retargeting that adapts those keyframes to a new scene, given an initial observation. Using these modified keypoints, our system warps the original demonstration to generate a new trajectory, which is then executed, and the resulting demo, if successful, is saved. Because the annotation is reusable across scenes, we use Thompson sampling to optimize the annotation, significantly improving generation success rate. We evaluate our method on a range of tasks, and find that our data annotation method consistently outperforms expert- engineered baselines. We further show an ensemble policy that combines the optimized LLM feed-forward plan with a learned feedback imitation learning controller. Finally, we demonstrate hardware feasibility on a Franka Emika Panda robot. For addi- tional materials and demonstration videos, please see the project website: https://sites.google.com/andrew.cmu.edu/llm-trainer

Index terms

Data Sets for Robot Learning Learning from Demonstration Integrated Planning and Learning