← Back ICRA 2026

ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue

Thomas Pritchard, Saifullah Ijaz, Ronald Clark, Basaran Bahadir Kocer

PDF

AI summary

Key figure (auto-extracted from paper)

ForestVO achieves competitive visual odometry accuracy in dense forests using only 25% of the keypoints required by baseline models, drastically reducing computational overhead for real-time deployment.

Visual Odometry Forest Navigation Feature Matching Deep Learning Autonomous Drones Pose Estimation

Problem

Dense foliage, variable lighting, and repetitive textures degrade traditional visual odometry feature matching, while GPS is unreliable and LiDAR is too computationally heavy and expensive for lightweight drones.

Approach

ForestGlue adapts SuperPoint and retrained LightGlue/SuperGlue for forest-specific feature detection and matching, feeding the results into a lightweight transformer model that directly regresses relative camera poses from 2D keypoint coordinates.

Key results

Achieves baseline pose accuracy using only 512 keypoints
Outperforms DSO by 40% on TartanAir forest sequences
Matches TartanVO performance with 10% of the training data
Delivers superior accuracy-efficiency trade-off over dense methods

Why it matters

Enables reliable, real-time autonomous navigation for resource-constrained drones and robots in GPS-denied forest environments.

Abstract

Recent advancements in visual odometry systems have improved autonomous navigation, yet challenges persist in complex environments like forests, where dense foliage, variable lighting, and repetitive textures compromise the accuracy of fea- ture correspondences. To address these challenges, we introduce ForestGlue. ForestGlue enhances the SuperPoint feature detector through four configurations – grayscale, RGB, RGB-D, and stereo-vision inputs – optimised for various sensing modalities. For feature matching, we employ LightGlue or SuperGlue, both of which have been retrained using synthetic forest data. ForestGlue achieves comparable pose estimation accuracy to baseline LightGlue and SuperGlue models, yet require only 512 keypoints, just 25% of the 2048 keypoints used by baseline models, to achieve an LO-RANSAC AUC score of 0.745 at a 10° threshold. With a 1/4 of the keypoints required, ForestGlue has the potential to reduce computational overhead whilst being effective in dynamic forest environments, making it a promising candidate for real-time deployment on resource-constrained plat- forms such as drones or mobile robotic platforms. By combining ForestGlue with a novel transformer based pose estimation model, we propose ForestVO, which estimates relative camera poses using the 2D pixel coordinates of matched features between frames. On challenging TartanAir forest sequences, ForestVO achieves an average relative pose error (RPE) of 1.09 m and kitti score of 2.33%, outperforming direct-based methods such as DSO in dynamic scenes by 40%, while maintaining competitive performance with TartanVO despite being a significantly lighter model trained on only 10% of the dataset. This work establishes an end-to-end deep learning pipeline tailored for visual odometry in forested environments, leveraging forest-specific training data to optimise feature correspondence and pose estimation for improved accuracy and robustness in autonomous navigation systems.

Index terms

Robotics and Automation in Agriculture and Forestry Agricultural Automation