Annotation Free Spacecraft Detection and Segmentation Using Vision Language Models
Samet Hicsonmez, Jose Sosa, Dan Pineau, INDER PAL SINGH, Arunkumar Rathinam, Abd El Rahman Shabayek, Djamila Aouada
AI summary
Problem
Manual annotation for spacecraft detection is costly and error-prone due to poor visibility and complex backgrounds, while models trained on synthetic data suffer from domain gaps.
Approach
The pipeline automatically generates pseudo-labels from unlabeled real images using a pre-trained Vision Language Model, refines them with test-time augmentation and weighted box fusion, and distills them into a compact student model via iterative knowledge distillation.
Key results
- Up to 10-point average precision gains over direct zero-shot VLM inference
- Eliminates reliance on extensive manual labeling for training
- Produces lightweight, real-time capable models for in-orbit deployment
- First framework to fully exploit VLM zero-shot capabilities for spacecraft segmentation
Why it matters
Provides a scalable, annotation-free solution for rapid deployment of robust spacecraft tracking and debris monitoring systems in space situational awareness.
Abstract
Vision Language Models (VLMs) have demon- strated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related appli- cations remains largely unexplored. In the space domain, accurate manual annotation is particularly challenging due to factors such as low visibility, illumination variations, and object blending with planetary backgrounds. Developing methods that can detect and segment spacecraft and orbital targets without requiring extensive manual labeling is therefore of critical importance. In this work, we propose an annotation- free detection and segmentation pipeline for space targets using VLMs. Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM. These pseudo-labels are then leveraged in a teacher-student label distillation framework to train lightweight models. Despite the inherent noise in the pseudo- labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference. Experimental eval- uations on the SPARK-2024, SPEED+, and TANGO datasets on segmentation tasks demonstrate consistent improvements in average precision (AP) by up to 10 points. Code and mod- els are available at https://github.com/giddyyupp/ annotation-free-spacecraft-segmentation.