← Back ICRA 2026

ILeSiA: Interactive Learning of Robot Situational Awareness from Camera Input

Petr Vanc, Giovanni Franzese, Jan Kristof Behrens, Cosimo Della Santina, Karla Stepanova, Jens Kober, Robert Babuska

PDF

AI summary

Key figure (auto-extracted from paper)

A camera-driven Gaussian Process model continuously estimates real-time risk scores, enabling robots to pause execution and learn from sparse human feedback to detect both known and novel faults.

Robot situational awareness Interactive learning Gaussian process regression Vision-based fault detection Learning from demonstration Human-in-the-loop robotics

Problem

Learning from demonstration struggles when robots encounter unforeseen environmental changes or interactions during task execution, as demonstrations only cover limited, successful scenarios. Robots lack the situational awareness to distinguish harmless variations from potentially harmful deviations that could lead to task failure.

Approach

The system encodes live camera frames into a low-dimensional latent space using an autoencoder, then feeds these features into a Gaussian Process regression model to output a continuous risk score. When the score exceeds a threshold, execution pauses, allowing a human supervisor to label the situation as safe or risky, which incrementally updates the model.

Key results

Reliably detects both known and novel faults using only a single example per new fault
Outperforms standard multi-layer perceptrons in identifying out-of-distribution scenarios
Enables real-time, vision-based risk assessment with continuous online model updates via sparse human feedback
Successfully validated on a Franka Panda manipulator performing peg-and-door manipulation tasks

Why it matters

It enables rapid deployment of collaborative robots with proactive, vision-based safety that adapts to new environments with minimal human intervention.

Abstract

Learning from demonstration is a promising ap- proach for teaching robots new skills. However, a central challenge in the execution of acquired skills is the ability to recognize faults and prevent failures. This is essential because demonstrations typically cover only a limited set of scenarios and often only the successful ones. During task execution, unforeseen situations may arise, such as changes in the robot’s environ- ment or interaction with human operators. To recognize such situations, this paper focuses on teaching the robot situational awareness by using a camera input and labeling frames as safe or risky. We train a Gaussian Process (GP) regression model fed by a low-dimensional latent space representation of the input images. The model outputs a continuous risk score ranging from zero to one, quantifying the degree of risk at each timestep. This allows for pausing task execution in unsafe situations and directly adding new training data, labeled by the human user. Our experiments on a robotic manipulator show that the proposed method can reliably detect both known and novel faults using only a single example for each new fault. In contrast, a standard multi-layer perceptron (MLP) performs well only on faults it has encountered during training. Our method enables the next generation of cobots to be rapidly deployed with easy-to-set-up, vision-based risk assessment, proactively safeguarding humans and detecting misaligned parts or missing objects before failures occur. We provide all the code and data required to reproduce our experiments at imitrob.ciirc.cvut.cz/publications/ilesia.

Index terms

Learning from Demonstration Safety in HRI Perception for Grasping and Manipulation