Research Analyzer
← Back ICRA 2026

AURA: Autonomous Upskilling with Retrieval-Augmented Agents

Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong

PDF

AI summary

Key figure (auto-extracted from paper)
AURA automates and improves reinforcement learning curriculum design by using LLMs and a retrieval-augmented feedback loop to generate, validate, and refine training pipelines from simple user prompts.
Curriculum Learning Retrieval-Augmented Generation Large Language Models Reinforcement Learning Humanoid Robotics Autonomous Policy Design

Problem

Designing multi-stage reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of rewards, randomizations, and configurations, which scales poorly and is brittle to human error. Existing LLM-guided pipelines inefficiently use compute and fail to learn from past training experiences.

Approach

AURA converts user prompts into schema-validated YAML training workflows using specialized LLM agents, then employs a retrieval-augmented vector database to condition future curriculum generation on successful past runs, enabling continuous self-improvement.

Key results

  • Consistently outperforms LLM-guided baselines in curriculum generation success rate
  • Achieves high-fidelity locomotion and manipulation policies in simulation
  • Successfully deploys trained policies zero-shot on custom kid-sized humanoid hardware
  • Schema validation and retrieval-augmented feedback critically improve curriculum quality and training stability

Why it matters

Enables researchers and engineers to scale adaptive policy learning pipelines without manual tuning, accelerating real-world deployment of agile robotic systems.

Abstract

Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated cur- riculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments —ca- pabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand. aura-research.org 1A. Zhu is with Department of Computer Science and Electrical En- gineering, 2Y. Tanaka and D. Hong are with Department of Mechan- ical and Aerospace Engineering, UCLA, Los Angeles, CA, USA. 3A. Goldberg is with Department Electrical Engineering and Computer Sci- ence, UC Berkeley, Berkeley, CA, USA. {alvin.zhu, yusuketanaka, dennis- hong}@g.ucla.edu. apgoldberg@berkeley.edu. ∗denotes equal contribution.

Index terms

Reinforcement Learning Humanoid and Bipedal Locomotion Machine Learning for Robot Control

Related papers