Automated Genomic Interpretation Via Concept Bottleneck Models for Medical Robotics
Zijun Li, Jinchang Zhang, Zhang Ming, Guoyu Lu
AI summary
Problem
Medical robotics and clinical automation currently rely on black-box deep learning models that lack transparency and actionable decision pathways for genomic data. Existing interpretable models for DNA sequences are rarely integrated into automated pipelines or linked to clinically relevant recommendations.
Approach
Raw DNA sequences are converted into 2D images via Chaos Game Representation and processed through a CNN constrained by a Concept Bottleneck Model. Predictions are forced through biologically meaningful concepts and regularized with fidelity supervision, prior alignment, and uncertainty calibration before being translated into cost-aware clinical recommendations.
Key results
- State-of-the-art HIV subtype classification accuracy on in-house and LANL datasets
- High-fidelity concept predictions aligned with biological priors like GC content and k-mer frequency
- Cost-aware recommendation layer that reduces unnecessary retests and optimizes clinical utility
- End-to-end automated pipeline bridging interpretable genomic modeling with robotic decision support
Why it matters
Provides a transparent, auditable, and clinically actionable foundation for integrating genomic analysis into medical robotics and automated diagnostic workflows.
Abstract
We propose an automated genomic interpretation module that transforms raw DNA sequences into actionable, interpretable decisions suitable for integration into medical automation and robotic systems. Our framework combines Chaos Game Representation (CGR) with a Concept Bottleneck Model (CBM), enforcing predictions to flow through biologically meaningful concepts such as GC content, CpG density, and k-mer motifs. To enhance reliability, we incorporate concept fidelity supervision, prior-consistency alignment, KL distribu- tion matching, and uncertainty calibration. Beyond accurate classification of HIV subtypes across both in-house and LANL datasets, our module delivers interpretable evidence that can be directly validated against biological priors. A cost-aware recommendation layer further translates predictive outputs into decision policies that balance accuracy, calibration, and clinical utility, reducing unnecessary retests and improving efficiency. Extensive experiments demonstrate that the proposed system achieves state-of-the-art classification performance, superior concept prediction fidelity, and more favorable cost–benefit trade-offs compared to existing baselines. By bridging the gap between interpretable genomic modeling and automated decision-making, this work establishes a reliable foundation for robotic and clinical automation in genomic medicine.