- Gesierich, Benno;
- Sander, Laura;
- Pirpamer, Lukas;
- Meier, Dominik;
- Ruberte, Esther;
- Amann, Michael;
- Sinnecker, Tim;
- Huck, Antal;
- de Leeuw, Frank-Erik;
- Maillard, Pauline;
- Moy, Sue;
- Helmer, Karl;
- Levin, Johannes;
- Höglinger, Günter;
- Kühne, Michael;
- Bonati, Leo;
- Kuhle, Jens;
- Cattin, Philippe;
- Granziera, Cristina;
- Schlaeger, Regina;
- Duering, Marco
Disorders of the central nervous system, including neurodegenerative diseases, frequently affect the brainstem and can present with focal atrophy. This study aimed to (1) optimize deep learning-based brainstem segmentation for a wide range of pathologies and T1-weighted image acquisition parameters, (2) conduct a systematic technical and clinical validation, (3) improve segmentation quality in the presence of brainstem lesions, and (4) make an optimized brainstem segmentation tool available for public use. An intentionally heterogeneous ground truth dataset (n = 257) was employed in the training of deep learning models based on multi-dimensional gated recurrent units (MD-GRU) or the nnU-Net method. Segmentation performance was evaluated against ground truth labels. FreeSurfer was used for benchmarking in subsequent validation. Technical validation, including scan-rescan repeatability (n = 46) and inter-scanner reproducibility (n = 20, 3 different scanners) in unseen data, was conducted in patients with cerebral small vessel disease. Clinical validation in unseen data was performed in 1-year follow-up data of 16 patients with multiple system atrophy, evaluating the annual percentage volume change. Two lesion filling algorithms were investigated to improve segmentation performance in 23 patients with multiple sclerosis. The MD-GRU and nnU-Net models demonstrated very good segmentation performance (median Dice coefficients ≥ 0.95 each) and outperformed a previously published model trained on a narrower dataset. Scan-rescan repeatability and inter-scanner reproducibility yielded similar Bland-Altman derived limits of agreement for longitudinal FreeSurfer (total brainstem volume repeatability/reproducibility 0.68/1.85), MD-GRU (0.72/1.46), and nnU-Net (0.48/1.52). All methods showed comparable performance in the detection of atrophy in the total brainstem (atrophy detected in 100% of patients) and its substructures. In patients with multiple sclerosis, lesion filling further improved the accuracy of brainstem segmentation. We enhanced and systematically validated two fully automated deep learning brainstem segmentation methods and released them publicly. This enables a broader evaluation of brainstem volume as a candidate biomarker for neurodegeneration.