← Back ICRA 2026

Enhancing Robot Learning through Cognitive Reasoning Trajectory Optimization under Unknown Dynamics

Qingwei Dong, Tingting Wu, Peng Zeng, Chuanzhi Zang, Guangxi Wan, Shijie Cui

PDF

AI summary

Key figure (auto-extracted from paper)

Fine-tuning large language models for low-level control significantly accelerates policy search and improves sample efficiency in complex robotic manipulation tasks.

Trajectory Optimization Reinforcement Learning Large Language Models Policy Search Robot Manipulation Cognitive Control

Problem

Reinforcement learning for robot manipulation converges slowly due to high-dimensional state-action spaces and vast initial policy search areas, while existing LLM applications remain limited to high-level task planning rather than direct low-level control.

Approach

The authors introduce Cognitive Reasoning Trajectory Optimization (CRTO), which fine-tunes an LLM to generate initial single-step decisions and uses those trajectories to fit dynamic models and optimize control policies under unknown dynamics.

Key results

Introduced Low-level Cognitive Control Tuning (LCCT) for direct LLM-based MDP control
Developed CRTO algorithm to constrain policy search space using LLM-generated trajectories
Validated method on three complex manipulation tasks using the Sawyer robot in MuJoCo
Demonstrated accelerated policy convergence and reduced training data requirements

Why it matters

Provides a scalable pathway to bridge high-level reasoning and low-level robotic control, enabling faster and more efficient skill acquisition in unstructured environments.

Abstract

In the domain of robot learning, equipping robots with the capability to swiftly acquire operational skills poses a significant challenge. Currently, reinforcement learning tech- niques are adept at addressing dynamic, unstructured problems involving rich contact scenarios. However, the convergence rate of these algorithms is often slow due to the high dimensionality of the robot state-action mapping space and the extensive initial policy search space. Meanwhile, advancements in large language models (LLMs) have endowed these models with a degree of logical reasoning ability, enabling them to take goal-oriented actions proactively during the initial phase of a robotic task. These models can implicitly generate features of states and uncover underlying patterns in trajectory generation. Yet, in complex manipulative tasks involving rich contact scenarios, LLMs still fall short. Thus, integrating the robust interactive capabilities of reinforcement learning with the strong logical rea- soning of LLMs, and enhancing policy search with LLMs, could potentially accelerate the speed of policy searches significantly. In this paper, we introduce a Cognitive Reasoning Trajectory Optimization method. This approach utilizes Low-level Cognitive Control Tuning to enable LLMs with robust logical reasoning to make effective single-step decisions in Markov Decision Pro- cess (MDP) tasks. By fitting dynamic models with high-quality cognitive reasoning data and optimizing control strategies, this method constrains the policy search space and enhances the efficiency of trajectory optimization. Experimental results on various manipulative tasks using the Sawyer robot in the Mujoco simulator validate the effectiveness of the proposed algorithm.

Index terms

Reinforcement Learning Embodied Cognitive Science Machine Learning for Robot Control