← Back ICRA 2026

Detection of Texting While Walking in Occluded Environment Using Variational Autoencoder for Safe Mobile Robot Navigation

Hayato Terao, Jiaxu Wu, Qi An, Atsushi Yamashita

PDF

AI summary

Key figure (auto-extracted from paper)

A VAE-guided framework reliably detects pedestrians texting while walking under various occlusion patterns, enabling safer autonomous robot navigation.

Pedestrian Activity Recognition Texting While Walking Variational Autoencoder Robot Navigation Occlusion Handling Computer Vision

Problem

Autonomous mobile robots struggle to safely navigate around distracted pedestrians, particularly when their bodies are partially obscured by obstacles or crowds, increasing collision risks.

Approach

The method extracts sequential full-body keypoints from video, uses a pre-trained variational autoencoder to reconstruct occluded data in a latent space, and classifies pedestrian activities into normal walking, texting, or other categories.

Key results

Novel VAE-based framework robustly handles both temporary and permanent body keypoint occlusions
Creation of two datasets covering controlled and real-world environments with diverse occlusion and posture scenarios
Ablation studies confirm reliable detection across occlusion patterns while highlighting challenges with visually similar postures
Real-world mobile robot testing validates practical applicability for collision avoidance in crowded spaces

Why it matters

Improves the safety and reliability of autonomous mobile robots in public spaces by enabling them to anticipate and navigate around distracted pedestrians.

Abstract

As autonomous mobile robots begin to populate public spaces, it is becoming increasingly important for robots to accurately distinguish pedestrians and navigate safely to avoid collisions. Texting while walking is a common but hazardous behavior among pedestrians that poses significant challenges for robot navigation systems. While several studies have addressed the detection of text walkers, many have overlooked the impact of occlusions, a very common phenomenon where parts of pedestrians are obscured from sensor’s view. This study proposes a machine learning method that distinguishes text walkers from other pedestrians in video data. The proposed method processes each video frame to extract body keypoints, encodes the keypoints into a latent space, and classifies pedestrian activities into three categories: normal walking, texting while walking, and other activities. A variational autoencoder is incorporated to enhance the system’s robustness under various occlusion scenarios. Per- formance tests in real-world environments identified potential areas for improvement, particularly in distinguishing pedestrian activities with similar body postures. However, ablation studies demonstrated that the proposed system performs reliably across different occlusion scenarios.

Index terms

Robot Safety Object Detection Segmentation and Categorization Autonomous Vehicle Navigation