From Manual to Operation: A Home Appliance Agent
Bo Mao, Troy Huang, Yuming, Jiayang Chai, Huaping Liu, Di Guo
AI summary
Problem
Robots struggle to operate household appliances accurately, particularly when exact user manuals are missing or inaccessible, severely limiting their autonomy in real-world home environments.
Approach
The framework deploys specialized LLM/VLM-powered agents to parse manuals, infer operational logic, and execute physical actions. When manuals are unavailable, it generalizes from category-level manuals or retrieves information via on-demand web search.
Key results
- Multi-agent architecture integrating perception, planning, execution, and reflection agents
- Novel manual-summarization module enabling zero-shot generalization to unseen appliances
- Hybrid knowledge acquisition strategy combining generalized manuals and web search
- Substantially improved success rates over baselines across 1,000+ simulation and real-world tasks
Why it matters
Enables autonomous, general-purpose home assistance by allowing robots to reliably operate novel appliances without task-specific training or pre-existing manuals.
Abstract
Operating household appliances by reading and understanding user manuals remains a fundamental and challenging problem in robotics. Recent works leverage large language models (LLMs) and vision-language models (VLMs) to interpret manuals, improving appliance operation success. However, these approaches fail when manuals are unavailable or incomplete. In this paper, we introduce an autonomous assistant for robotic appliance operation, built upon an LLMs/VLMs- powered multi-agent collaborative framework. Our system can read, comprehend, and summarize manuals, autonomously infer operational logic, and execute actions on appliances with a robotic arm. Importantly, for unseen appliances without manuals, it can acquire operational knowledge from generalized manuals and on-demand web search. Extensive evaluations on over one thousand tasks show that our framework substantially outperforms baselines and achieves robust performance in simulation and real-world experiments.