← Back ICRA 2026

Robust Monocular Visual Odometry Via Dual-Paradigm Curriculum Learning

Assaf Lahiany, Oren Gal

PDF

AI summary

Key figure (auto-extracted from paper)

Restructuring the training schedule via motion-based trajectory ordering and adaptive loss weighting dramatically improves monocular visual odometry robustness without altering the network architecture.

Monocular Visual Odometry Curriculum Learning Reinforcement Learning Self-Paced Learning Robustness SLAM

Problem

Monocular visual odometry models drift significantly under aggressive motion and sensor noise, while conventional training regimes over-represent easy, low-motion data, leaving networks unprepared for challenging scenarios.

Approach

The authors wrap an unmodified DPVO backbone with a dual-paradigm curriculum that orders training trajectories by motion complexity and dynamically adjusts optical-flow, translation, and rotation loss weights using self-paced progression and reinforcement learning schedulers.

Key results

33% ATE reduction on TartanAir
47% faster baseline convergence
Strong zero-shot gains across EuRoC, TUM-RGBD, KITTI, and ICL-NUIM
Improved cross-sequence consistency with zero inference overhead

Why it matters

Offers a practical, architecture-agnostic training strategy that boosts real-world VO robustness and generalizes to other geometric vision tasks without increasing inference costs.

Abstract

Monocular visual odometry (VO) is accurate in controlled settings yet drifts sharply under aggressive motion and sensor noise. We offer a fundamental rethinking of VO robustness as a training-schedule problem rather than an architectural chal- lenge, introducing a novel dual-paradigm curriculum learning framework that operates at both trajectory and loss-component levels. (i) A motion-based curriculum orders trajectories by measured motion complexity. (ii) A hierarchical component curriculum adaptively re-weights optical-flow, pose, and rotation losses via Self-Paced and in-training Reinforcement Learning (RL) schedulers. Integrated into an unmodified DPVO baseline, these strategies cut TartanAir ATE by 33% with only 31% extra training wall-time, and reach baseline accuracy 47% faster (Self-Paced). Without fine-tuning, the same models improve zero- shot performance on EuRoC (13% ATE reduction), TUM-RGBD (9%; 46% on dynamic scenes), KITTI (21%), and ICL-NUIM (32%). We show that explicit difficulty progression or adaptive loss weighting provides a practical, zero-inference-overhead path to robust monocular VO and could extend to other geometric vision tasks.

Index terms

Vision-Based Navigation SLAM Deep Learning Methods