← Back ICRA 2026

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Non-Towered Airspace

Sundhar Vinodh Sangeetha, Chih-Yuan Chiu, Sarah H.Q. Li, Shreyas Kousik

PDF

AI summary

Key figure (auto-extracted from paper)

Conditioning aircraft goal prediction on pilot radio communications significantly reduces prediction error compared to motion-history-only baselines.

Aircraft goal prediction Non-towered airspace Language conditioning Autonomous flight Multimodal prediction Radio communication

Problem

Autonomous aircraft lack the ability to interpret unstructured human pilot radio calls in non-towered airspace, causing current conflict avoidance systems to miss critical intent information and operate unsafely.

Approach

The framework transcribes and interprets pilot radio calls into discrete intent labels using speech-to-text and large language models, then fuses these labels with observed flight trajectories to condition a temporal convolutional network and Gaussian mixture model for probabilistic goal prediction.

Key results

Accurate ASR and speaker identification for non-towered CTAF calls
Discrete intent extraction from unstructured radio communications
Multimodal goal prediction framework fusing trajectory and language
Significantly reduced goal prediction error on real-world non-towered data

Why it matters

Enables safer mixed human-autonomy operations in non-towered airspace by allowing autonomous aircraft to interpret and act on critical pilot intent.

Abstract

Autonomous aircraft must safely operate in non- towered airspace, where coordination relies on voice-based communication among human pilots. Safe operation requires an aircraft to predict the intent, and corresponding goal location, of other aircraft. This paper introduces a multimodal framework for aircraft goal prediction that integrates natural language understanding with spatial reasoning to improve autonomous decision-making in such environments. We leverage automatic speech recognition and large language models to transcribe and interpret pilot radio calls, identify aircraft, and extract discrete intent labels. These intent labels are fused with observed trajectories to condition a temporal convolutional network and Gaussian mixture model for probabilistic goal prediction. Our method significantly reduces goal prediction error compared to baselines that rely solely on motion history, demonstrating that language-conditioned prediction increases prediction accuracy. Experiments on a real-world dataset from a non-towered airport validate the approach and highlight its potential to enable socially aware, language-conditioned robotic motion planning.

Index terms

Intention Recognition Aerial Systems: Perception and Autonomy Aerial Systems: Applications