AI summary
Problem
Standard imitation learning relies on the Markov assumption, causing failures in long-horizon tasks where historical context is critical. Existing history-aware architectures like Transformers are computationally infeasible for long sequences due to quadratic complexity.
Approach
MTIL uses the Mamba-2 State Space Model to maintain a compressed hidden state that efficiently encodes the entire observation history, conditioning action predictions on this full temporal context rather than just the current frame.
Key results
- Achieves perfect or near-perfect success rates on ACT benchmark tasks
- Outperforms state-of-the-art methods like ACT and Diffusion Policy
- Enables efficient training on commodity hardware without out-of-memory errors
- Demonstrates superior lifelong learning performance on the LIBERO benchmark
Why it matters
It makes history-aware imitation learning computationally feasible and highly effective for complex, long-horizon robotic manipulation tasks.
Abstract
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where historyiscrucialforresolvingperceptualambiguity.Thislimitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transform- ers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences in- feasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a prac- tical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes the entire trajectory history into a compressed, evolving state. This allows the policy to be conditioned on a comprehensive temporal context, transcending the confines of Markovian approaches. Through extensive experiments on simu- lated benchmarks (ACT, Robomimic, LIBERO) and on challeng- ing real-world tasks, MTIL demonstrates superior performance against SOTA methods like ACT and Diffusion Policy, particularly in resolving long-term temporal ambiguities. Our findings not only affirm the necessity of full temporal context but also validate MTIL as a powerful and a computationally feasible approach for learning long-horizon, non-Markovian behaviors from high-dimensional observations.