Research Analyzer
← Back ICRA 2026

Action Sequence Transfer Via LLMs for Heterogeneous Environments

Choong Ho Chung, DongHwan Shin, Sung-Hee Lee

PDF

AI summary

Key figure (auto-extracted from paper)
LLMs can successfully adapt human action sequences to entirely new environments by inferring high-level intent and substituting unavailable objects with functional alternatives.
Action Sequence Transfer Large Language Models Robot Adaptation Scene Graphs Goal-Conditioned Planning Heterogeneous Environments

Problem

How can robots adapt human-demonstrated action sequences to environments with different spatial layouts and object inventories while preserving the original task intent?

Approach

A multi-stage LLM pipeline extracts high-level intent and activity properties from source actions, maps them to target constraints, and generates adapted sequences using retrieval-augmented generation and object substitution.

Key results

  • Novel LLM-based transfer framework preserving semantic intent across heterogeneous spaces
  • Goalstep-Spatial dataset with 44 hours of video annotated with scene graphs and activity descriptions
  • Valid action sequence generation even with zero object overlap between source and target environments
  • Retrieval-augmented pipeline improving core activity prediction and property transfer accuracy

Why it matters

Enables scalable, context-aware robotic behavior for real-world human-robot interaction by allowing flexible adaptation to diverse environments without costly re-planning.

Abstract

We present an action sequence transfer system that adaptively transfers user action sequences across different target spaces. Given an input action sequence from a source space and scene graph representations of both the source and target environments, our system predicts a corresponding action sequence in the target space by adapting to the spatial and object constraints of the new environment. To achieve this, we leverage multi-level representations of user activity to generalize actions at varying levels of abstraction. To demonstrate our system, we collect a new scene graph-based dataset derived from the Ego4D GoalStep dataset for evaluation. Results indicate that our system can generate valid action sequences even between spaces with drastically different object configurations.

Index terms

AI-Based Methods Agent-Based Systems Autonomous Agents

Related papers