Quasimetric Decision Transformers: Enhancing Goal-Conditioned Reinforcement Learning with Structured Distance Guidance
MADHAV GOYANI, Heidar Davoudi, Mehran Ebrahimi
AI summary
Problem
Standard Decision Transformers rely on heuristic return-to-go tokens that are uninformative and suboptimal for goal-conditioned tasks, particularly in long-horizon environments with sparse rewards. This limitation hinders effective trajectory stitching and generalization to unseen goals.
Approach
The authors introduce the Quasimetric Decision Transformer (QuaD), which replaces RTG conditioning with a learned quasimetric function that estimates the directional difficulty of reaching a goal. This structured distance signal is integrated with value-aware loss functions like AWR and DDPG+BC to prioritize high-value actions.
Key results
- Outperforms standard Decision Transformers and offline RL baselines on AntMaze benchmarks
- Achieves state-of-the-art success rates in sparse-reward, long-horizon navigation tasks
- Demonstrates improved generalization to unseen goals through structured distance guidance
- Validates IQE and MRN quasimetric architectures as effective RTG alternatives
Why it matters
Provides a theoretically grounded distance signal that bridges sequence modeling and goal-conditioned RL, advancing the development of safe and sample-efficient autonomous agents for complex robotics and navigation.
Abstract
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. Decision Transformers (DT) have shown promising results in offline reinforcement learning by leveraging sequence modeling. However, standard DT methods rely on return-to-go (RTG) tokens, which are heuristically defined and often suboptimal for goal-conditioned tasks. In this work, we introduce Quasimetric Decision Transformer (QuaD), a novel approach that replaces RTG with learned quasimet- ric distances, providing a more structured and theoretically grounded guidance signal for long-horizon decision-making. We explore two quasimetric formulations: interval quasimetric embeddings (IQE) and metric residual networks (MRN), and integrate them into DTs. Extensive evaluations on the AntMaze benchmark demonstrate that QuaD outperforms standard Decision Transformers, achieving state-of-the-art success rates and improved generalization to unseen goals. Our results suggest that quasimetric guidance is a viable alternative to RTG, opening new directions for learning structured distance representations in offline RL.