High-Altitude Balloon Station-Keeping with First Order Model Predictive Control
Myles Pasetsky, Jiawei Lin, Bradley Guo, Sarah Dean
AI summary
Problem
Prior high-altitude balloon station-keeping research relies heavily on model-free reinforcement learning, dismissing model-based approaches as impractical due to uncertain wind forecasts and complex dynamics, leaving a critical gap for rigorous baselines.
Approach
The authors implement balloon and wind dynamics as differentiable functions in JAX, enabling gradient-based trajectory optimization for online receding-horizon planning.
Key results
- 24% improvement in time-within-radius over Perciatelli44 RL policy
- Gains an additional 1.8 hours per day within station-keeping radius
- Open-sources a fully differentiable JAX implementation of high-altitude balloon dynamics
- Demonstrates online planning effectiveness across simplified wind and dynamics models via ablation studies
Why it matters
Establishes a necessary model-based baseline for validating future RL controllers and guides practical deployment of autonomous high-altitude balloons for atmospheric research.
Abstract
High-altitude balloons (HABs) are common in sci- entific research due to their wide range of applications and low cost. Because of their nonlinear, underactuated dynamics and the partial observability of wind fields, prior work has largely relied on model-free reinforcement learning (RL) methods to design near-optimal control schemes for station-keeping. These methods often compare only against hand-crafted heuristics, dismissing model-based approaches as impractical given the system complexity and uncertain wind forecasts. We revisit this assumption about the efficacy of model-based control for station-keeping by developing First-Order Model Predictive Control (FOMPC). By implementing the wind and balloon dy- namics as differentiable functions in JAX, we enable gradient- based trajectory optimization for online planning. FOMPC outperforms a state-of-the-art RL policy, achieving a 24% improvement in time-within-radius (TWR) without requiring offline training, though at the cost of greater online computation per control step. Through systematic ablations of modeling assumptions and control factors, we show that online planning is effective across many configurations, including under simplified wind and dynamics models.