Research Analyzer
← Back IROS 2024

A Robotic-Centric Paradigm for 3D Human Tracking under Complex Environments Using Multi-Modal Adaptation

Shuo Xin, Zhen Zhang, Liang Liu, Xiaojun Hou, Deye Zhu, Mengmeng Wang, Yong Liu

PDF

Abstract

The goal of this paper is to strike a feasible tracking paradigm that can make 3D human trackers appli- cable on robot platforms and enable more high-level tasks. Till now, two fundamental problems haven’t been adequately addressed. One is the computational cost lightweight enough for robotic deployment, and the other is the easily-influenced accuracy varied greatly in complex real environments. In this paper, a robotic-centric tracking paradigm called MATNet is proposed that directly matches the LiDAR point clouds and RGB videos through end-to-end learning. To improve the low accuracy of human tracking against disturbance, a coarse- to-fine Transformer along with target-ware augmentation is proposed by fusing RGB videos and point clouds through a pyramid encoding and decoding strategy. To better meet the real-time requirement of actual robot deployment, we introduce the parameter-efficient adaptation tuning that greatly shortens the model’s training time. Furthermore, we also propose a five- step Anti-shake Refinement strategy and have added human prior values to overcome the strong shaking on the robot plat- form. Extensive experiments confirm that MATNet significantly outperforms the previous state-of-the-art on both open-source datasets and large-scale robotic datasets.

Index terms

Visual Tracking Multi-Modal Perception for HRI Human-Centered Robotics