← Back ICRA 2026

Learning Robust Control Policies for Inverted Pose on Miniature Blimp Robots

Yuanlin Yang, Lin Hong, Fumin Zhang

PDF

AI summary

Key figure (auto-extracted from paper)

A deep reinforcement learning framework with domain randomization and sim-to-real mapping enables miniature blimps to reliably achieve and maintain unstable inverted poses despite dynamic parameter variations.

Miniature blimp robots inverted pose control deep reinforcement learning domain randomization sim-to-real transfer robust control

Problem

Miniature blimp robots possess complex, underactuated dynamics and weak thrust, rendering conventional control strategies ineffective for agile maneuvers like maintaining an unstable inverted pose. Existing model-based controllers degrade significantly under real-world parameter variations and environmental disturbances.

Approach

The authors train a robust inverted control policy using a modified TD3 algorithm with physics-informed domain randomization in a calibrated Unity simulation, then deploy it on a physical blimp via a learned mapping layer to bridge the sim-to-real gap.

Key results

First Unity-based 3D simulation tailored for MBR inverted control
Robust policy maintains inverted pose across varied mass distributions and motor gains
Successful sim-to-real transfer enables real-world inverted stabilization without retraining
Higher simulation success rates compared to energy-shaping baseline controllers

Why it matters

Enables agile flight capabilities for miniature blimps, expanding their viability for inspection, monitoring, and entertainment applications.

Abstract

The ability to achieve and maintain inverted poses is essential for unlocking the full agility of miniature blimp robots (MBRs). However, developing reliable inverted control strategies for MBRs remains challenging due to their com- plex and underactuated dynamics. To address this challenge, we propose a novel framework that enables robust control policy learning for inverted pose on MBRs. The proposed framework consists of three core stages. First, a high-fidelity three-dimensional (3D) simulation environment is constructed and calibrated using real-world MBR motion data. Second, a robust inverted control policy is trained in simulation using a modified Twin Delayed Deep Deterministic Policy Gradi- ent (TD3) algorithm combined with a domain randomization strategy. Third, a mapping layer is designed to bridge the sim-to-real gap and facilitate real-world deployment of the learned policy. Comprehensive evaluations in the simulation environment demonstrate that the learned policy achieves a higher success rate compared to the energy-shaping controller. Furthermore, experimental results confirm that the learned policy with a mapping layer enables an MBR to achieve and maintain a fully inverted pose in real-world settings.

Index terms

Reinforcement Learning Machine Learning for Robot Control Aerial Systems: Mechanics and Control