← Back ICRA 2026

ExFMan: Rendering 3D Dynamic Humans with Hybrid Monocular Blurry Frames and Events

Kanghao Chen, Zeyu Wang, Lin Wang

PDF

AI summary

Key figure (auto-extracted from paper)

ExFMan enables high-quality 3D dynamic human rendering from monocular blurry videos by adaptively fusing RGB frames and event camera data based on local motion velocity.

Neural Rendering Event Cameras Motion Blur Dynamic Human Reconstruction Sensor Fusion 3D Avatars

Problem

Reconstructing clear 3D dynamic humans from in-the-wild monocular videos is hindered by severe motion blur, which causes shape and appearance inconsistencies. Existing methods either ignore blur or rely on error-prone two-stage deblurring that fails on complex human motion.

Approach

The framework explicitly models a 3D velocity field to identify blurry body regions, then adaptively reweights RGB and event camera losses to suppress blur artifacts and recover fine details in high-velocity areas.

Key results

First neural rendering framework for dynamic humans using hybrid blurry RGB and event data
Novel event-oriented blur-aware velocity field to localize motion blur
Velocity-aware photometric and velocity-relative event losses for adaptive optimization
State-of-the-art reconstruction quality on synthetic and real-world blurry datasets

Why it matters

Enables robust 3D human modeling for robotics and VR applications where rapid motion and imperfect camera conditions are common.

Abstract

Recent advances in neural rendering have enabled the 3D reconstruction of dynamic humans from monocular videos, with applications in robotics. However, it is still chal- lenging to reconstruct clear humans from in-the-wild video encountering motion blur, causing shape and appearance incon- sistencies, especially in blurry regions like hands and legs. In this paper, we propose ExFMan, the first neural rendering framework that unveils the possibility of rendering high-quality humans in rapid motion with a hybrid frame-based RGB and bio-inspired event camera. The “out-of-the-box” insight is to leverage the high temporal information of event data in a complementary manner and adaptively reweight the effect of losses for both RGB frames and events in the local regions, according to the velocity of the rendered human. This significantly mitigates the inconsistency associated with motion blur in the RGB frames. Specifically, we first formulate a velocity field of the 3D body in the canonical space and render it to image space to identify the body parts with motion blur. We then propose two novel losses, i.e., velocity-aware photometric loss and velocity-relative event loss, to optimize the neural human for both modalities under the guidance of the estimated velocity. In addition, we incorporate novel pose regularization and alpha losses to facilitate continuous pose and clear boundary. Extensive experiments on synthetic and real-world datasets demonstrate that ExFMan can reconstruct sharper and higher quality humans over the compared baselines and the state-of-the-art methods for diverse blurry subjects.

Index terms

Deep Learning for Visual Perception Visual Learning Sensor Fusion