← Back ICRA 2026

Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model

Kentaro Fujii, Shingo Murata

PDF

AI summary

Key figure (auto-extracted from paper)

A novel deep active inference framework enables real-world robots to efficiently balance goal-directed and exploratory actions under uncertainty.

Deep Active Inference World Model Robot Control Exploration Temporal Hierarchy Real-World Robotics

Problem

Most deep learning-based robot control methods neglect exploration and struggle with environmental uncertainty, while conventional active inference approaches suffer from limited representation capacity and prohibitively high computational costs.

Approach

The authors introduce a framework combining a temporally hierarchical world model that captures multi-timescale dynamics with an action model that compresses action sequences into abstract actions for tractable, low-cost decision-making.

Key results

High success rates across diverse real-world object manipulation tasks
Dynamic switching between goal-directed and exploratory actions under uncertainty
Computationally tractable action selection compared to conventional active inference
Accurate prediction of future state transitions using learned abstract actions

Why it matters

Provides a scalable, biologically inspired control architecture for deploying adaptive robots in complex, uncertain real-world environments.

Abstract

Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal- directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmen- tal representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object- manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.

Index terms

Cognitive Control Architectures Learning from Experience Machine Learning for Robot Control