Robust Monocular Visual Odometry Via Dual-Paradigm Curriculum Learning
Assaf Lahiany, Oren Gal
AI summary
Problem
Monocular visual odometry models drift significantly under aggressive motion and sensor noise, while conventional training regimes over-represent easy, low-motion data, leaving networks unprepared for challenging scenarios.
Approach
The authors wrap an unmodified DPVO backbone with a dual-paradigm curriculum that orders training trajectories by motion complexity and dynamically adjusts optical-flow, translation, and rotation loss weights using self-paced progression and reinforcement learning schedulers.
Key results
- 33% ATE reduction on TartanAir
- 47% faster baseline convergence
- Strong zero-shot gains across EuRoC, TUM-RGBD, KITTI, and ICL-NUIM
- Improved cross-sequence consistency with zero inference overhead
Why it matters
Offers a practical, architecture-agnostic training strategy that boosts real-world VO robustness and generalizes to other geometric vision tasks without increasing inference costs.
Abstract
Monocular visual odometry (VO) is accurate in controlled settings yet drifts sharply under aggressive motion and sensor noise. We offer a fundamental rethinking of VO robustness as a training-schedule problem rather than an architectural chal- lenge, introducing a novel dual-paradigm curriculum learning framework that operates at both trajectory and loss-component levels. (i) A motion-based curriculum orders trajectories by measured motion complexity. (ii) A hierarchical component curriculum adaptively re-weights optical-flow, pose, and rotation losses via Self-Paced and in-training Reinforcement Learning (RL) schedulers. Integrated into an unmodified DPVO baseline, these strategies cut TartanAir ATE by 33% with only 31% extra training wall-time, and reach baseline accuracy 47% faster (Self-Paced). Without fine-tuning, the same models improve zero- shot performance on EuRoC (13% ATE reduction), TUM-RGBD (9%; 46% on dynamic scenes), KITTI (21%), and ICL-NUIM (32%). We show that explicit difficulty progression or adaptive loss weighting provides a practical, zero-inference-overhead path to robust monocular VO and could extend to other geometric vision tasks.