Research Analyzer
← Back ICRA 2026

MOSAIC: Multi-Objective Optimization from Zero-Shot Language Reasoning in Preference-Based RL

Daniel Marta, Simon Holk, Iolanda Leite

PDF

AI summary

Key figure (auto-extracted from paper)
Leveraging zero-shot LLMs to extract multiple objectives from human preference prompts significantly improves multi-objective RL performance and prevents objective collapse compared to single-preference methods.
Multi-objective RL Preference-based learning Large language models Human-robot interaction Reward shaping Zero-shot reasoning

Problem

Existing preference-based RL methods typically collapse human feedback into a single reward function, ignoring the multi-dimensional nature of human preferences and the causal reasoning behind them, which leads to objective collapse and causal confusion.

Approach

MOSAIC uses zero-shot large language models to parse natural language prompts accompanying human preferences, extracting distinct objectives and their relative weights to train an ensemble of reward functions optimized via multi-objective reinforcement learning.

Key results

  • Introduces MOSAIC framework for multi-objective preference-based RL
  • Proposes weighted ensemble variance query sampling for informative feedback selection
  • Develops sentiment-based reward regularization to highlight critical trajectory segments
  • Demonstrates superior performance over single-preference baselines across simulated and real human feedback tasks

Why it matters

Enables robots to accurately learn complex, multi-dimensional goals from natural language feedback, advancing practical human-robot alignment and preference-based control.

Abstract

Preference-based Reinforcement Learning (RL) enables humans to shape complex goals via preference com- parisons between sequences of state-action pairs. Most of the existing approaches focus on a singular objective, overlooking the complex causal reasoning that underpins preferences. However, many real-world challenges are multi-dimensional, and individuals can have different reasons behind their preferences. In this work, we rethink preference-based RL from a multi- objective perspective by distilling human preferences into multiple components. We leverage the zero-shot capabilities of large language models (LLMs) to infer preferences and better align various objectives from text prompts. This allows us to train an ensemble of reward functions, each optimizing for a specific objective. We demonstrate that our approach can address a variety of multi-objective control tasks, improving on approaches that consider a single preference per objective. We show the effectiveness of our approach in better shaping reward functions by utilizing real human preferences and prompts. Our code for the benchmarks, along with additional supplementary details, is available at https://sites.google.com/view/multi-pref/.

Index terms

Human Factors and Human-in-the-Loop Machine Learning for Robot Control Reinforcement Learning

Related papers