Action Sequence Transfer Via LLMs for Heterogeneous Environments
Choong Ho Chung, DongHwan Shin, Sung-Hee Lee
AI summary
Problem
How can robots adapt human-demonstrated action sequences to environments with different spatial layouts and object inventories while preserving the original task intent?
Approach
A multi-stage LLM pipeline extracts high-level intent and activity properties from source actions, maps them to target constraints, and generates adapted sequences using retrieval-augmented generation and object substitution.
Key results
- Novel LLM-based transfer framework preserving semantic intent across heterogeneous spaces
- Goalstep-Spatial dataset with 44 hours of video annotated with scene graphs and activity descriptions
- Valid action sequence generation even with zero object overlap between source and target environments
- Retrieval-augmented pipeline improving core activity prediction and property transfer accuracy
Why it matters
Enables scalable, context-aware robotic behavior for real-world human-robot interaction by allowing flexible adaptation to diverse environments without costly re-planning.
Abstract
We present an action sequence transfer system that adaptively transfers user action sequences across different target spaces. Given an input action sequence from a source space and scene graph representations of both the source and target environments, our system predicts a corresponding action sequence in the target space by adapting to the spatial and object constraints of the new environment. To achieve this, we leverage multi-level representations of user activity to generalize actions at varying levels of abstraction. To demonstrate our system, we collect a new scene graph-based dataset derived from the Ego4D GoalStep dataset for evaluation. Results indicate that our system can generate valid action sequences even between spaces with drastically different object configurations.