Research Analyzer
← Back ICRA 2026

From Manual to Operation: A Home Appliance Agent

Bo Mao, Troy Huang, Yuming, Jiayang Chai, Huaping Liu, Di Guo

PDF

AI summary

Key figure (auto-extracted from paper)
A multi-agent AI system enables robots to operate both known and unseen home appliances by dynamically combining manual summaries and web search.
Robotic appliance operation Multi-agent systems Vision-language models Manual-based learning Zero-shot generalization

Problem

Robots struggle to operate household appliances accurately, particularly when exact user manuals are missing or inaccessible, severely limiting their autonomy in real-world home environments.

Approach

The framework deploys specialized LLM/VLM-powered agents to parse manuals, infer operational logic, and execute physical actions. When manuals are unavailable, it generalizes from category-level manuals or retrieves information via on-demand web search.

Key results

  • Multi-agent architecture integrating perception, planning, execution, and reflection agents
  • Novel manual-summarization module enabling zero-shot generalization to unseen appliances
  • Hybrid knowledge acquisition strategy combining generalized manuals and web search
  • Substantially improved success rates over baselines across 1,000+ simulation and real-world tasks

Why it matters

Enables autonomous, general-purpose home assistance by allowing robots to reliably operate novel appliances without task-specific training or pre-existing manuals.

Abstract

Operating household appliances by reading and understanding user manuals remains a fundamental and challenging problem in robotics. Recent works leverage large language models (LLMs) and vision-language models (VLMs) to interpret manuals, improving appliance operation success. However, these approaches fail when manuals are unavailable or incomplete. In this paper, we introduce an autonomous assistant for robotic appliance operation, built upon an LLMs/VLMs- powered multi-agent collaborative framework. Our system can read, comprehend, and summarize manuals, autonomously infer operational logic, and execute actions on appliances with a robotic arm. Importantly, for unseen appliances without manuals, it can acquire operational knowledge from generalized manuals and on-demand web search. Extensive evaluations on over one thousand tasks show that our framework substantially outperforms baselines and achieves robust performance in simulation and real-world experiments.

Index terms

AI-Enabled Robotics Domestic Robotics Agent-Based Systems

Related papers