← Back ICRA 2026

Legs Over Arms: On the Predictive Value of Lower-Body Pose for Human Trajectory Prediction from Egocentric Robot Perception

Nhat Le, Daeun Song, Xuesu Xiao

PDF

AI summary

Key figure (auto-extracted from paper)

Lower-body 3D skeletal keypoints provide the strongest predictive signal for human trajectory forecasting, significantly outperforming full-body or upper-body features.

Human trajectory prediction Egocentric perception Skeletal keypoints Social robot navigation 360° cameras Biomechanical cues

Problem

Current social robot navigation models often abstract humans into point masses, ignoring rich egocentric visual cues like body pose and gait that are critical for safe, short-term trajectory prediction in crowded environments.

Approach

The authors systematically evaluate the predictive utility of various 2D and 3D skeletal keypoints and derived biomechanical cues as inputs to a trajectory prediction model, testing them on both a public egocentric dataset and a new 360° panoramic social navigation dataset.

Key results

Lower-body 3D keypoints reduce Average Displacement Error by 13% over baseline
Augmenting keypoints with biomechanical cues yields an additional 1–4% accuracy gain
2D keypoints from distorted 360° panoramic images still improve prediction by 7%
Lower-body features consistently outperform full-body and upper-body configurations

Why it matters

These findings provide actionable guidelines for robot designers to prioritize lower-body pose tracking and 360° camera placement for safer, socially compliant navigation.

Abstract

Predicting human trajectory is crucial for social robot navigation in crowded environments. While most existing approaches treat human as point mass, we present a study on multi-agent trajectory prediction that leverages different human skeletal features for improved forecast accuracy. In particular, we systematically evaluate the predictive utility of 2D and 3D skeletal keypoints and derived biomechanical cues as additional inputs. Through a comprehensive study on the JRDB dataset and another new dataset for social navigation with 360° panoramic videos, we find that focusing on lower-body 3D keypoints yields a 13% reduction in Average Displacement Error and augmenting 3D keypoint inputs with corresponding biomechanical cues provides a further 1-4% improvement. Notably, the performance gain persists when using 2D keypoint inputs extracted from equirectangular panoramic images, indi- cating that monocular surround vision can capture informative cues for motion forecasting. Our finding that robots can forecast human movement efficiently by watching their legs provides actionable insights for designing sensing capabilities for social robot navigation.

Index terms

Gesture Posture and Facial Expressions Human Detection and Tracking Human-Centered Robotics