← Back ICRA 2026

Learning-Based Joint Control with Hierarchical Reinforcement Learning and On-Device Execution

Satoshi Yagi, Jun Morimoto

PDF

AI summary

Key figure (auto-extracted from paper)

Decoupling motor dynamics via hierarchical reinforcement learning enables presetting-free, real-time joint control with a single shared policy across multiple robot joints.

Hierarchical Reinforcement Learning Robot Joint Control On-Device Inference Motor Current Control Embedded Robotics Neural Control

Problem

Conventional robot control relies on manually tuned PID controllers that degrade under varying loads and require extensive setup, while high-level reinforcement learning struggles to capture fast motor dynamics and generalize across different joints.

Approach

The method trains a fast, low-level neural policy for direct current-to-PWM control and a slower, high-level policy for position control, with the lower layer quantized and deployed on onboard microcontrollers.

Key results

Eliminates manual PID tuning via learned current control
Outperforms non-hierarchical RL in tracking accuracy and speed
Enables single position policy sharing across multiple joints
Validates real-time quantized policy execution on microcontrollers

Why it matters

This approach streamlines robot deployment by removing manual tuning bottlenecks and enabling scalable, hardware-efficient control policies that generalize across joints.

Abstract

In typical robot learning, deep reinforcement learn- ing policies are employed in the upper control layer to gen- erate target joint angles for robot motion, while conventional controllers are used in the fast lower control layer to control each joint motor. This paper presents a fully neural network- based hierarchical reinforcement learning approach for real-time robot joint control. The proposed method divides joint control into two layers: a high-frequency current control policy and a low-frequency position control policy. The current control policy drives the motor to follow the target current while learning the dynamic characteristics of the joint. The position control policy generates the target current to achieve a desired joint angle, allowing learning and inference at a slower frequency. By decoupling motor dynamics from position control, our method improves learning performance and enables policy generalization across joints. Experimental results on a three-joint robotic arm demonstrate the effectiveness of the proposed approach, including posture control using a shared position control policy across joints.

Index terms

Machine Learning for Robot Control Embedded Systems for Robotic and Automation Neural and Fuzzy Control