HINT-3D: Human-In-The-Loop Interactive Test-Time Adaptation for 3D Segmentation
Odei jamaleddine, Imad Elhajj, Daniel Asmar
AI summary
Problem
Pretrained 3D segmenters degrade under deployment shift, and existing interactive methods freeze weights or require offline retraining, preventing real-time adaptation. The paper asks whether a model can safely learn from a few human clicks at test time and retain those gains across future scenes.
Approach
HINT-3D converts user clicks into region masks via PointSAM, then performs stability-aware, head-only updates during inference. Updated weights are persisted across scenes for cumulative learning, with uncertainty gating to prevent drift on unreliable regions.
Key results
- Strong effort-accuracy gains within scenes using only a few corrective clicks
- Consistent zero-click accuracy improvements across subsequent scenes via weight persistence
- Reduced Expected Calibration Error (ECE) and maintained low latency with head-only updates
- Model-agnostic validation across KPConv, RandLA-Net, and Point Transformer v1 backbones
Why it matters
Enables real-time, persistent correction of 3D segmentation models in dynamic environments like AR/VR and robotics without costly retraining.
Abstract
We present HINT-3D, a human-in-the-loop test-time adaptation framework for 3D semantic segmentation. A few corrective clicks are converted into region masks by a promptable 3D interface (PointSAM). These masks supervise stability-aware updates to a pretrained backbone at inference. We persist the updates so later scenes start from improved weights, enabling cumulative learning. The wrapper is backbone-agnostic: it requires only logits, a mask-to-index bridge, plus access to a small trainable parameter set; we instantiate it on KPConv, RandLA-Net, and Point Transformer v1. On S3DIS Area-5, HINT-3D delivers strong effort-accuracy gains within a scene, consistent zero-click improvements across scenes, and reduced Expected Calibration Error (ECE), while maintaining responsiveness with head-only updates and uncertainty-gated training. We report mIoU versus saved masks, cross-scene transfer, ECE, latency, and class-specific corrections on common indoor failure modes.