← Back ICRA 2026

Simultaneous Deep Model-Based Reinforcement Learning and State Inference under Partial Observability

William Cong, Josiah Hanna

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating expectation-maximization with Bayesian state estimation enables deep model-based reinforcement learning to jointly infer hidden states and learn dynamics, outperforming recurrent neural network baselines in partially observable environments.

Model-based reinforcement learning Partial observability Expectation-maximization Bayesian state estimation Deep dynamics learning Robot control

Problem

Model-based reinforcement learning struggles under partial observability because it must simultaneously infer hidden states from noisy observations and learn environment dynamics, creating a chicken-and-egg problem that existing methods fail to solve efficiently.

Approach

The authors introduce EMBRL, an expectation-maximization framework that alternates between estimating latent states using particle or extended Kalman filters and updating a deep neural network transition model, stabilized by techniques like decoupled transition noise and evidence-gated optimization.

Key results

EMBRL framework for joint state inference and deep dynamics learning under partial observability
Practical stabilization techniques including decoupled transition noise and evidence-gated optimization
Two concrete instantiations using extended Kalman filters/smoothers and particle filters/smoothers
Higher-performing policies than RNN baselines on simulated and real-robot tasks

Why it matters

Enables autonomous robots to learn effective control policies from limited real-world interactions without requiring full state observability or prior knowledge of exact dynamics.

Abstract

Model-based reinforcement learning (MBRL) is a promising approach to enabling robots to learn directly from a limited number of real-world interactions. MBRL is notoriously difficult in settings without full state observability because algorithms must simultaneously infer state from incomplete observations and use these inferences to learn environment dynamics. Toward the use of MBRL for autonomous robots, we introduce EMBRL, an expectation-maximization framework that combines classical Bayesian state estimation with deep MBRL to jointly infer states and learn neural network state transition models. This framework takes advantage of the rich theory and practice of state estimation from the field of robotics, while enabling behavior learning without a priori known robot dynamics. Though conceptually straightforward, our instantiation of this framework for deep MBRL reveals several key challenges when using a learned transition model both for state inference and policy learning. We introduce a practical implementation of EMBRL using both particle and extended Kalman filters and smoothers and discuss key design choices necessary for effective implementation. Finally, we evaluate different instantiations of the EMBRL framework on both simulated and real-robot tasks and show that our methods learn higher performing policies compared to strong MBRL baselines using recurrent neural networks.

Index terms

Reinforcement Learning Machine Learning for Robot Control Deep Learning Methods