← Back ICRA 2026

LEAR: Learning Edge-Aware Representations for Event-To-LiDAR Localization

Kuangyi Chen, Jun Zhang, Yuxi Hu, Yi Zhou, Friedrich Fraundorfer

PDF

AI summary

Key figure (auto-extracted from paper)

Jointly learning edge structures and dense event–depth flow fields bridges the modality gap, enabling significantly more robust and accurate camera pose estimation from events and LiDAR maps.

Event cameras LiDAR localization cross-modal alignment edge-aware learning pose estimation visual navigation

Problem

Aligning sparse, asynchronous event camera data with dense LiDAR point clouds for visual localization is fundamentally ill-posed due to modality gaps, leading to poor correspondence estimation and limited robustness in challenging environments.

Approach

LEAR introduces a dual-task learning framework that mutually reinforces a dense event–depth flow estimator and an edge detector through cross-task feature fusion and iterative refinement, creating edge-aware representations that bridge the sensing-modality divide.

Key results

Proposes a dual-task framework jointly learning edge detection and event–depth flow estimation
Introduces Cross-task Feature Fusion (CFF) and Iterative Feature Refinement (IFR) modules for mutually reinforcing feature learning
Achieves state-of-the-art localization accuracy on challenging public datasets like M3ED
Demonstrates that edge-aware representations significantly improve cross-modal consistency and pose recovery robustness

Why it matters

Enables reliable visual localization for robots and UAVs in GPS-denied or visually degraded environments where standard cameras fail.

Abstract

Event cameras offer high-temporal-resolution sensing that remains reliable under high-speed motion and challenging lighting, making them promising for localization from LiDAR point clouds in GPS-denied and visually degraded environments. However, aligning sparse, asynchronous events with dense LiDAR maps is fundamentally ill-posed, as direct correspondence estimation suffers from modality gaps. We propose LEAR, a dual-task learning framework that jointly estimates edge structures and dense event–depth flow fields to bridge the sensing-modality divide. Instead of treating edges as a post-hoc aid, LEAR couples them with flow estimation through a cross-modal fusion mechanism that injects modality- invariant geometric cues into the motion representation, and an iterative refinement strategy that enforces mutual consistency between the two tasks over multiple update steps. This synergy produces edge-aware, depth-aligned flow fields that enable more robust and accurate pose recovery via Perspective-n-Point (PnP) solvers. On several popular and challenging datasets, LEAR achieves superior performance over the best prior method. The source code, trained models, and demo videos are made publicly available online1.

Index terms

Deep Learning for Visual Perception Localization