Sim-to-Real Domain Shift in Online Action Detection
Constantin Patsch, Wael Torjmene, Marsil Zakour, Yuankai Wu, Driton Salihu, Eckehard Steinbach
Abstract
Human reasoning comprises the ability to under- stand and reason about the current action solely based on past information. To provide effective assistance in an eldercare or household environment an assistive robot or intelligent assistive system has to assess human actions correctly. Based on this presumption, the task of online action detection determines the current action solely based on the past without access to future information. During inference, the performance of the model is largely impacted by the attributes of the underlying training dataset. However, as high costs and ethical concerns are associated with the real-world data collection process, synthetically created data provides a way to mitigate these problems while providing additional data for the training process of the underlying action detection model to improve performance. Due to the inherent domain shift between the synthetic and real data, we introduce a new egocentric dataset called Human Kitchen Interactions (HKI) to investigate the sim- to-real gap. Our dataset contains in total 100 synthetic and real videos in which 21 different actions are executed in a kitchen environment. The synthetic data is acquired in an egocentric virtual reality (VR) setup while capturing the virtual environment in a game engine. We evaluate state-of-the-art online action detection models on our dataset and provide insights into sim-to-real domain shift. Upon acceptance, we will release our dataset and the corresponding features at https://c-patsch.github.io/HKI/.