Research Analyzer
← Back ICRA 2026

IntuFly: Intuitive Continuous Hand�Gaze Control for UAVs

Junsheng Xu, Ke Ma, Xinde Li, Chengxiang Yu, Zeyu Zhang, Zhentong Zhang

PDF

AI summary

Key figure (auto-extracted from paper)
IntuFly enables non-experts to fly UAVs more intuitively and efficiently by combining continuous hand-drawn trajectories for translation with gaze-based heading and target locking, outperforming traditional RC controllers in speed and usability.
UAV control multimodal interaction hand-gaze fusion continuous control human-drone interaction sim-to-real

Problem

Single-modality UAV control interfaces suffer from discontinuous intent, high cognitive load, and susceptibility to noise like Midas touch, creating steep learning curves for non-experts.

Approach

IntuFly fuses continuous 3D hand-drawn trajectories for translational velocity with gaze-driven yaw and automatic target locking, using a timestamp-consistent late-fusion layer with stabilization to overcome cross-stream asynchrony and noise.

Key results

  • Novices flew faster on shorter paths than with RC controllers in simulation
  • Intermediates selected shorter, smoother trajectories with lower cognitive workload
  • Gaze-assisted tracking achieved faster response times and near-complete line-of-sight coverage
  • System runs stably at >25 Hz on commodity hardware and transfers seamlessly to a real DJI Tello UAV

Why it matters

Lowers the entry barrier for non-expert UAV operation while maintaining fine control and stability, offering a practical path toward intuitive human-drone cooperation.

Abstract

Operating Unmanned Aerial Vehicles (UAVs) re- mains challenging for non-experts because single-modality in- terfaces distort intent: gesture-only systems depend on discrete vocabularies and mode switches that break continuity and raise cognitive load, while gaze-only control offers limited dimensionality and is vulnerable to Midas-touch and saccadic jitter. We present IntuFly, an intuition-driven hand–gaze frame- work in which hands draw the path to give continuous 3D translation and eyes set heading and lock targets, preserving intent continuity and reducing effort. To overcome cross-stream asynchrony and noise, our deployment-oriented fusion layer performs timestamp-consistent late fusion with stale-frame dropping and lightweight stabilization, yielding stable closed- loop operation at more than 25 Hz on commodity hardware. In simulation racing, novices fly faster on shorter paths than a Remote controller (RC) baseline, and intermediates select shorter, smoother yet more conservative lines; Subjective scales indicate lower workload and higher usability. In mobile target tracking, adding gaze produces faster responses with near- complete line-of-sight (LOS) coverage under identical limits. The same perception–control stack runs stably on an indoor DJI Tello platform with behavior consistent with simulation, demonstrating sim-to-real feasibility. These results show that IntuFly lowers the learning barrier for non-expert users while preserving fine control and stability, offering a deployable path toward intuitive, continuous human–UAV cooperative flight. Our code is publicly available at https://github.com/Crotonbee/ IntuFly.

Index terms

Human-Centered Robotics Gesture Posture and Facial Expressions Aerial Systems: Perception and Autonomy

Related papers