← Back ICRA 2026

Quantization of DRL Models for Embedded Microcontrollers

Peter Bohm, Pauline Pounds, Peyman Moghadam, Archie Chapman, Jen Jen Chung

PDF

AI summary

Key figure (auto-extracted from paper)

Quantized deep reinforcement learning policies can be successfully deployed on low-cost microcontrollers to control robots in real-time with minimal performance loss.

Deep Reinforcement Learning Model Quantization Microcontroller Deployment GRU Encoder Edge AI Quadrupedal Robotics

Problem

Deep reinforcement learning models typically require heavy computational resources, making them impractical for low-cost, resource-constrained embedded devices. Existing quantization tools also lack native support for critical DRL operations like GRUs and are optimized for computer vision rather than control policies.

Approach

The authors design a streamlined, quantization-friendly actor network for SAC and TD3 algorithms and integrate a custom GRU-based encoder compatible with standard quantization tools. They apply post-training INT8 quantization and deploy the resulting models on an ESP32-S3 microcontroller for on-board robotic inference.

Key results

Novel quantization-compatible GRU implementation enabling framework-agnostic deployment
Streamlined SAC/TD3 actor network optimized for inference-only INT8 quantization
Successful on-board deployment of quantized policies on an ESP32-S3 microcontroller
Demonstrated real-time robotic control using only proprioceptive feedback and local inference

Why it matters

Enables affordable, energy-efficient, and fully autonomous robotic control by bringing advanced DRL algorithms to ultra-low-cost embedded hardware.

Abstract

For Deep Reinforcement Learning (DRL) models to deliver actual utility, they must function within production environments, which often lack the extensive computational resources of training environments. This requirement for dedi- cated GPU resources is not economically feasible and can be es- pecially prohibitive in low-cost robotic contexts. Neural network quantization serves as a viable solution to these constraints. This technique aims to lessen computational and memory requirements, while maintaining performance. By reducing the precision of the DRL network weights and the network input (sensory observations), the deployment size can be compacted to fit within MCU class devices, while ensuring that inference oper- ates at adequate frequencies. This paper investigates the impact of quantization on DRL policies and presents a quantization- friendly network architecture for the Soft Actor-Critic (SAC) and TD3 algorithms. We propose a streamlined actor network optimized for inference-only deployments and quantization, and integrate a GRU-based encoder into the DRL framework using a custom, quantization-compatible implementation. The changes enable both to be quantized to integer precision. We then deploy the quantized policies on a microcontroller-scale device (ESP32-S3) to control a low-cost quadrupedal robot using only proprioception and on-board inference.

Index terms

Embedded Systems for Robotic and Automation