← Back ICRA 2026

Quasimetric Decision Transformers: Enhancing Goal-Conditioned Reinforcement Learning with Structured Distance Guidance

MADHAV GOYANI, Heidar Davoudi, Mehran Ebrahimi

PDF

AI summary

Key figure (auto-extracted from paper)

Replacing heuristic return-to-go tokens with learned quasimetric distances significantly improves goal-reaching and generalization in long-horizon offline reinforcement learning.

Quasimetric Decision Transformer Goal-Conditioned RL Offline Reinforcement Learning Distance Guidance Sequence Modeling

Problem

Standard Decision Transformers rely on heuristic return-to-go tokens that are uninformative and suboptimal for goal-conditioned tasks, particularly in long-horizon environments with sparse rewards. This limitation hinders effective trajectory stitching and generalization to unseen goals.

Approach

The authors introduce the Quasimetric Decision Transformer (QuaD), which replaces RTG conditioning with a learned quasimetric function that estimates the directional difficulty of reaching a goal. This structured distance signal is integrated with value-aware loss functions like AWR and DDPG+BC to prioritize high-value actions.

Key results

Outperforms standard Decision Transformers and offline RL baselines on AntMaze benchmarks
Achieves state-of-the-art success rates in sparse-reward, long-horizon navigation tasks
Demonstrates improved generalization to unseen goals through structured distance guidance
Validates IQE and MRN quasimetric architectures as effective RTG alternatives

Why it matters

Provides a theoretically grounded distance signal that bridges sequence modeling and goal-conditioned RL, advancing the development of safe and sample-efficient autonomous agents for complex robotics and navigation.

Abstract

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. Decision Transformers (DT) have shown promising results in offline reinforcement learning by leveraging sequence modeling. However, standard DT methods rely on return-to-go (RTG) tokens, which are heuristically defined and often suboptimal for goal-conditioned tasks. In this work, we introduce Quasimetric Decision Transformer (QuaD), a novel approach that replaces RTG with learned quasimet- ric distances, providing a more structured and theoretically grounded guidance signal for long-horizon decision-making. We explore two quasimetric formulations: interval quasimetric embeddings (IQE) and metric residual networks (MRN), and integrate them into DTs. Extensive evaluations on the AntMaze benchmark demonstrate that QuaD outperforms standard Decision Transformers, achieving state-of-the-art success rates and improved generalization to unseen goals. Our results suggest that quasimetric guidance is a viable alternative to RTG, opening new directions for learning structured distance representations in offline RL.

Index terms

Reinforcement Learning Representation Learning Imitation Learning