X-MOS: A Heterogeneous Cross-LiDAR Generalization Framework for Moving Object Segmentation
Minjae Lee, Ilhwan Ha, Sang-Min Choi, Gun-Woo Kim, Suwon Lee
AI summary
Problem
Deep learning models for moving object segmentation degrade significantly when deployed on different LiDAR sensors due to domain shifts caused by varying hardware specifications. Naively training on combined heterogeneous data biases models toward high-density sensors while failing on sparse ones.
Approach
The framework generates sensor-specific expert teacher models and uses the sensor type as privileged information to selectively activate the appropriate teacher during training. This multi-teacher knowledge distillation strategy guides a single student model to learn unbiased, cross-sensor generalizable features.
Key results
- Mitigates training bias toward dense sensors, achieving balanced performance across diverse LiDAR hardware
- Reaches an overall test mIoU of 0.717 on the HeLiMOS dataset, surpassing naive training and individual experts
- More than doubles segmentation accuracy on the challenging 16-channel Velodyne sensor
- Demonstrates strong zero-shot generalization to unseen datasets with similar sensor types
Why it matters
Enables practical, hardware-agnostic perception systems for autonomous vehicles by eliminating the need for sensor-specific model retraining.
Abstract
Moving object segmentation (MOS) is founda- tional for autonomous vehicle safety. However, the increasing diversity of LiDAR sensors creates a significant domain shift problem, causing models trained on one sensor to perform poorly when deployed on another. A naive approach of training on combined data from heterogeneous sensors leads to a biased model that favors high-density sensors while failing on sparse, low-resolution sensors. To address this issue, we propose X-MOS, a novel generalization framework based on multi-teacher knowledge distillation. X-MOS generates sensor- specific expert teacher models and employs a sensor-aware knowledge distillation strategy. This strategy uses the sensor type as privileged information to activate the most appropriate teacher at each training step, providing unambiguous learning signals to a single student model. Extensive experiments on the HeLiMOS dataset, which comprises four different LiDAR sensors, demonstrate the effectiveness of our framework. X- MOS mitigates training bias and achieves an overall test mIoU of 0.717, outperforming both naive training and the best individual expert teacher. Notably, it more than doubles the performance on the most challenging low-channel sensor. Fur- thermore, our model exhibits strong zero-shot generalization to unseen datasets with similar sensor types. This work provides a robust and scalable methodology for achieving cross-sensor generalization, which is foundational for more practical and adaptable perception systems in autonomous driving.