Learning Robust Control Policies for Inverted Pose on Miniature Blimp Robots
Yuanlin Yang, Lin Hong, Fumin Zhang
AI summary
Problem
Miniature blimp robots possess complex, underactuated dynamics and weak thrust, rendering conventional control strategies ineffective for agile maneuvers like maintaining an unstable inverted pose. Existing model-based controllers degrade significantly under real-world parameter variations and environmental disturbances.
Approach
The authors train a robust inverted control policy using a modified TD3 algorithm with physics-informed domain randomization in a calibrated Unity simulation, then deploy it on a physical blimp via a learned mapping layer to bridge the sim-to-real gap.
Key results
- First Unity-based 3D simulation tailored for MBR inverted control
- Robust policy maintains inverted pose across varied mass distributions and motor gains
- Successful sim-to-real transfer enables real-world inverted stabilization without retraining
- Higher simulation success rates compared to energy-shaping baseline controllers
Why it matters
Enables agile flight capabilities for miniature blimps, expanding their viability for inspection, monitoring, and entertainment applications.
Abstract
The ability to achieve and maintain inverted poses is essential for unlocking the full agility of miniature blimp robots (MBRs). However, developing reliable inverted control strategies for MBRs remains challenging due to their com- plex and underactuated dynamics. To address this challenge, we propose a novel framework that enables robust control policy learning for inverted pose on MBRs. The proposed framework consists of three core stages. First, a high-fidelity three-dimensional (3D) simulation environment is constructed and calibrated using real-world MBR motion data. Second, a robust inverted control policy is trained in simulation using a modified Twin Delayed Deep Deterministic Policy Gradi- ent (TD3) algorithm combined with a domain randomization strategy. Third, a mapping layer is designed to bridge the sim-to-real gap and facilitate real-world deployment of the learned policy. Comprehensive evaluations in the simulation environment demonstrate that the learned policy achieves a higher success rate compared to the energy-shaping controller. Furthermore, experimental results confirm that the learned policy with a mapping layer enables an MBR to achieve and maintain a fully inverted pose in real-world settings.