Multimodal Variational DeepMDP: An Efficient Approach for Industrial Assembly in High-Mix, Low-Volume Production
Grzeogrz Bartyzel
AI summary
Problem
High-mix, low-volume manufacturing requires robots to rapidly adapt to new components and layouts, but existing reinforcement learning methods lack the transferability and sample efficiency needed for contact-rich assembly tasks.
Approach
The method learns separate latent dynamic representations for each sensor modality (vision, pose, force/torque) and combines them using a weighted generalized Product-of-Experts mechanism to create a unified state representation for reinforcement learning.
Key results
- Generalized Product-of-Experts effectively balances modality confidence for better task-relevant feature extraction
- Per-modality dynamic and reward prediction significantly improves policy transferability across unseen parts and layouts
- Independent processing of sensor modalities yields more informative latent states than direct concatenation
- Successfully generalizes to diverse electronic components and 3D-printed blocks under background disturbances
Why it matters
Reduces production downtime and retooling costs by enabling flexible, data-efficient robotic assembly for customized manufacturing lines.
Abstract
Transferability, along with sample efficiency, is a critical factor for a reinforcement learning (RL) agent’s successful application in real-world contact-rich manipulation tasks, such as product assembly. For instance, in the case of the industrial insertion task on high-mix, low-volume (HMLV) production lines, transferability could eliminate the need for machine retooling, thus reducing production line downtimes. In our work, we introduce a method called Multimodal Variational DeepMDP (MVDeepMDP) that demonstrates the ability to generalize to var- ious environmental variations not encountered during training. The key feature of our approach involves learning a multimodal latent dynamic representation. We demonstrate the effectiveness of our method in the context of an electronic parts insertion task, which is challenging for RL agents due to the diverse physical properties of the non-standardized components, as well as simple 3D-printed blocks insertion. Furthermore, we evaluate the transferability of MVDeepMDP and analyze the impact of the balancing mechanism of the generalized Product-of-Expert, which is used to combine observable modalities. Finally, we explore the influence of separately processing state modalities of different physical quantities, such as pose and 6D force/torque (F/T) data.