Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Non-Towered Airspace
Sundhar Vinodh Sangeetha, Chih-Yuan Chiu, Sarah H.Q. Li, Shreyas Kousik
AI summary
Problem
Autonomous aircraft lack the ability to interpret unstructured human pilot radio calls in non-towered airspace, causing current conflict avoidance systems to miss critical intent information and operate unsafely.
Approach
The framework transcribes and interprets pilot radio calls into discrete intent labels using speech-to-text and large language models, then fuses these labels with observed flight trajectories to condition a temporal convolutional network and Gaussian mixture model for probabilistic goal prediction.
Key results
- Accurate ASR and speaker identification for non-towered CTAF calls
- Discrete intent extraction from unstructured radio communications
- Multimodal goal prediction framework fusing trajectory and language
- Significantly reduced goal prediction error on real-world non-towered data
Why it matters
Enables safer mixed human-autonomy operations in non-towered airspace by allowing autonomous aircraft to interpret and act on critical pilot intent.
Abstract
Autonomous aircraft must safely operate in non- towered airspace, where coordination relies on voice-based communication among human pilots. Safe operation requires an aircraft to predict the intent, and corresponding goal location, of other aircraft. This paper introduces a multimodal framework for aircraft goal prediction that integrates natural language understanding with spatial reasoning to improve autonomous decision-making in such environments. We leverage automatic speech recognition and large language models to transcribe and interpret pilot radio calls, identify aircraft, and extract discrete intent labels. These intent labels are fused with observed trajectories to condition a temporal convolutional network and Gaussian mixture model for probabilistic goal prediction. Our method significantly reduces goal prediction error compared to baselines that rely solely on motion history, demonstrating that language-conditioned prediction increases prediction accuracy. Experiments on a real-world dataset from a non-towered airport validate the approach and highlight its potential to enable socially aware, language-conditioned robotic motion planning.