← Back ICRA 2026

E^2DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

Kaiyan Zhao, Borong Zhang, Yiming Wang, Xingyu Liu, Xuetao Li, Yuyang Chen, Xiaoguang Niu

PDF

AI summary

Key figure (auto-extracted from paper)

Coupling a Decision Transformer with an active, quality-diversity-aware k-DPP sampler dramatically improves sample efficiency and task success in long-horizon robotic manipulation.

Decision Transformer Experience-Aware Sampling k-DPP Robotic Manipulation Reinforcement Learning Quality-Diversity

Problem

Standard Decision Transformers rely on passive, uniform experience replay, which causes poor sample efficiency, redundant training data, and limited exploration in long-horizon robotic tasks.

Approach

E2DT uses the Decision Transformer’s own latent embeddings and predictive signals to score trajectory quality and diversity, then employs a k-Determinantal Point Process to actively select the most informative training subsets for policy updates.

Key results

DT-guided k-DPP sampler unifies quality and diversity metrics for active data selection
Composite quality score integrating return quantiles, predictive uncertainty, and stage coverage
Debiased mixed replay mechanism mitigates selection bias during training
Substantial gains in sample efficiency, convergence speed, and success rate on simulation and real-robot benchmarks

Why it matters

Enables scalable, data-efficient training of sequence-based policies for complex robotic manipulation, bridging the gap between offline RL and real-world deployment.

Abstract

In reinforcement learning (RL) for robotic ma- nipulation, the Decision Transformer (DT) has emerged as an effective framework for addressing long-horizon tasks. However, DT’s performance depends heavily on the coverage of collected experiences. Without an active exploration mech- anism, standard DT relies on uniform replay, which leads to poor sample efficiency, limited exploration, and reduced overall effectiveness. At the same time, while excessive exploration can help avoid local optima, it often delays policy convergence and leads to degraded efficiency. To address these limitations, we propose E2DT, a DT-guided k-Determinantal Point Process sampling framework that enables the model to actively shape its own experience selection. Our framework is experience-aware, allowing E2DT to be both efficient, by prioritizing sampling quality (e.g., high-return, high-uncertainty, and underrepre- sented trajectories), and effective, by ensuring diversity across trajectory windows to preserve policy optimality. Specifically, DT’s internal latent embeddings measure diversity across tra- jectory windows, while quality is quantified through a composite metric that integrates return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage (inverse frequency). These two dimensions are integrated into a novel quality–diversity joint kernel that prioritizes the most informative experiences, thereby enabling learning that is both efficient and effective. We evaluate E2DT on challenging robotic manipulation bench- marks in both simulation and real-robot settings. Results show that it consistently outperforms prior methods. These findings demonstrate that coupling policy learning with experience- aware sampling provides a principled path toward robust long- horizon robotic learning.

Index terms

Reinforcement Learning