← Back ICRA 2026

Reward-Free Continual Adaptation for Resilient Space Robots

Andrej Orsula, Miguel A. Olivares-Mendez, Carol Martinez

PDF

AI summary

Key figure (auto-extracted from paper)

Space robots can rapidly recover from severe hardware degradation using only unsupervised rollouts and a pre-trained latent reward landscape, eliminating the need for external reward signals.

Reward-free learning Continual adaptation Space robotics World models Hardware degradation Latent dynamics

Problem

Hardware degradation in space environments catastrophically breaks pre-trained control policies, but continual reinforcement learning cannot be deployed because precise reward computation is often impossible without external tracking or privileged simulation states.

Approach

The framework freezes the observation encoder and reward predictor of a pre-trained world model while updating only its transition dynamics through unsupervised environmental rollouts, allowing the agent to adapt its policy using purely synthetic trajectories.

Key results

Rapid initial policy recovery across planetary traversal, orbital navigation, and precision assembly tasks
Successful adaptation to severe morphological failures without external reward signals
Late-stage performance decay caused by representation drift in continuously updated dynamics
Validation that pre-trained latent reward landscapes generalize sufficiently for short-term autonomous recovery

Why it matters

Enables long-duration space missions to maintain operational capability after hardware failures without relying on impractical onboard reward computation or extensive retraining.

Abstract

Space robots operate in extreme environments where hardware degradation can critically compromise tradi- tional control strategies. While continual reinforcement learn- ing offers a promising mechanism for online adaptation, it inherently requires access to a reward signal during deploy- ment. However, precise reward computation in space is often infeasible due to the lack of external tracking systems and the overall complexity of the environment. To address the challenge of unobservable rewards, we introduce a reward-free continual learning framework that leverages latent-state world models. By pre-training a model-based agent across diverse simulations, the world model learns a robust predictor of the reward structure within its latent space. Upon deployment to an environment with severe hardware degradation, we freeze the observation encoder and reward predictor to update only the transition dynamics of the world model through unsu- pervised rollouts. By training the policy entirely on imagined trajectories generated by this updated world model, the agent adapts to altered dynamics without receiving new rewards. We demonstrate our approach across simulated planetary traversal, orbital navigation, and precision assembly tasks subjected to severe morphological failures. The source code is available at github.com/AndrejOrsula/space_robotics_bench.

Index terms

Space Robotics and Automation Continual Learning Reinforcement Learning