← Back ICRA 2026

HINT-3D: Human-In-The-Loop Interactive Test-Time Adaptation for 3D Segmentation

Odei jamaleddine, Imad Elhajj, Daniel Asmar

PDF

AI summary

Key figure (auto-extracted from paper)

A few corrective clicks during inference enable a 3D segmenter to cumulatively learn across scenes, significantly boosting accuracy and calibration without catastrophic forgetting.

interactive segmentation test-time adaptation 3D point clouds human-in-the-loop cumulative learning promptable segmentation

Problem

Pretrained 3D segmenters degrade under deployment shift, and existing interactive methods freeze weights or require offline retraining, preventing real-time adaptation. The paper asks whether a model can safely learn from a few human clicks at test time and retain those gains across future scenes.

Approach

HINT-3D converts user clicks into region masks via PointSAM, then performs stability-aware, head-only updates during inference. Updated weights are persisted across scenes for cumulative learning, with uncertainty gating to prevent drift on unreliable regions.

Key results

Strong effort-accuracy gains within scenes using only a few corrective clicks
Consistent zero-click accuracy improvements across subsequent scenes via weight persistence
Reduced Expected Calibration Error (ECE) and maintained low latency with head-only updates
Model-agnostic validation across KPConv, RandLA-Net, and Point Transformer v1 backbones

Why it matters

Enables real-time, persistent correction of 3D segmentation models in dynamic environments like AR/VR and robotics without costly retraining.

Abstract

We present HINT-3D, a human-in-the-loop test-time adaptation framework for 3D semantic segmentation. A few corrective clicks are converted into region masks by a promptable 3D interface (PointSAM). These masks supervise stability-aware updates to a pretrained backbone at inference. We persist the updates so later scenes start from improved weights, enabling cumulative learning. The wrapper is backbone-agnostic: it requires only logits, a mask-to-index bridge, plus access to a small trainable parameter set; we instantiate it on KPConv, RandLA-Net, and Point Transformer v1. On S3DIS Area-5, HINT-3D delivers strong effort-accuracy gains within a scene, consistent zero-click improvements across scenes, and reduced Expected Calibration Error (ECE), while maintaining responsiveness with head-only updates and uncertainty-gated training. We report mIoU versus saved masks, cross-scene transfer, ECE, latency, and class-specific corrections on common indoor failure modes.

Index terms

Object Detection Segmentation and Categorization