← Back ICRA 2026

Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning

Shlok Deshmukh, Javier Alonso-Mora, Sihao Sun

PDF

AI summary

Key figure (auto-extracted from paper)

A reinforcement learning policy enables precise, robust 6-DoF end-effector pose control on a lightweight, underactuated aerial manipulator, even under heavy loads and contact disturbances.

Aerial manipulation Reinforcement learning Underactuated control Sim-to-real transfer PPO Lightweight robotics

Problem

Lightweight aerial manipulators with minimal mechanical designs face severe underactuation and sensitivity to external disturbances, making robust whole-body control challenging with traditional model-based methods.

Approach

We train a Proximal Policy Optimization agent in simulation to generate feedforward commands for the drone and arm, tracked by low-level INDI and PID controllers, using domain randomization to ensure reliable sim-to-real deployment.

Key results

Centimeter-level position accuracy and degree-level orientation precision in real-world flights
Stable pose control while carrying payloads up to 140 g (16% of system mass)
Successful pushing of a 590 g object (68% of system mass) while maintaining orientation
Ablation studies confirm domain randomization and specific observations drastically improve sim-to-real transfer

Why it matters

Demonstrates that learning-based control can unlock reliable, contact-rich aerial manipulation on simple, lightweight platforms for real-world applications like disaster response and industrial inspection.

Abstract

Aerial manipulators, which combine robotic arms with multi-rotor drones, face strict constraints on arm weight and mechanical complexity. In this work, we study a lightweight 2-degree-of-freedom (DoF) arm mounted on a quadrotor via a differential mechanism, capable of full six-DoF end-effector pose control. While the minimal design enables simplicity and reduced payload, it also introduces challenges such as underactuation and sensitivity to external disturbances. To address these, we employ reinforcement learning, training a Proximal Policy Optimization (PPO) agent in simulation to generate feedforward commands for quadrotor acceleration and body rates, along with joint angle targets. These commands are tracked by an incremental nonlinear dynamic inversion (INDI) attitude controller and a PID joint controller, respec- tively. Flight experiments demonstrate centimeter-level position accuracy and degree-level orientation precision, with robust performance under external force disturbances—including ma- nipulation of heavy loads and pushing tasks. The results highlight the potential of learning-based control strategies for enabling contact-rich aerial manipulation using simple, lightweight platforms. Videos of the experiment and the method are summarized in https://youtu.be/bWLTPqKcCOA.

Index terms

Aerial Systems: Applications Aerial Systems: Mechanics and Control Reinforcement Learning