Quantization of DRL Models for Embedded Microcontrollers
Peter Bohm, Pauline Pounds, Peyman Moghadam, Archie Chapman, Jen Jen Chung
AI summary
Problem
Deep reinforcement learning models typically require heavy computational resources, making them impractical for low-cost, resource-constrained embedded devices. Existing quantization tools also lack native support for critical DRL operations like GRUs and are optimized for computer vision rather than control policies.
Approach
The authors design a streamlined, quantization-friendly actor network for SAC and TD3 algorithms and integrate a custom GRU-based encoder compatible with standard quantization tools. They apply post-training INT8 quantization and deploy the resulting models on an ESP32-S3 microcontroller for on-board robotic inference.
Key results
- Novel quantization-compatible GRU implementation enabling framework-agnostic deployment
- Streamlined SAC/TD3 actor network optimized for inference-only INT8 quantization
- Successful on-board deployment of quantized policies on an ESP32-S3 microcontroller
- Demonstrated real-time robotic control using only proprioceptive feedback and local inference
Why it matters
Enables affordable, energy-efficient, and fully autonomous robotic control by bringing advanced DRL algorithms to ultra-low-cost embedded hardware.
Abstract
For Deep Reinforcement Learning (DRL) models to deliver actual utility, they must function within production environments, which often lack the extensive computational resources of training environments. This requirement for dedi- cated GPU resources is not economically feasible and can be es- pecially prohibitive in low-cost robotic contexts. Neural network quantization serves as a viable solution to these constraints. This technique aims to lessen computational and memory requirements, while maintaining performance. By reducing the precision of the DRL network weights and the network input (sensory observations), the deployment size can be compacted to fit within MCU class devices, while ensuring that inference oper- ates at adequate frequencies. This paper investigates the impact of quantization on DRL policies and presents a quantization- friendly network architecture for the Soft Actor-Critic (SAC) and TD3 algorithms. We propose a streamlined actor network optimized for inference-only deployments and quantization, and integrate a GRU-based encoder into the DRL framework using a custom, quantization-compatible implementation. The changes enable both to be quantized to integer precision. We then deploy the quantized policies on a microcontroller-scale device (ESP32-S3) to control a low-cost quadrupedal robot using only proprioception and on-board inference.