Purpose
To report an image analysis pipeline, DDLSNet, consisting of a rim segmentation (RimNet) branch and a disc size classification (DiscNet) branch to automate estimation of the disc damage likelihood scale (DDLS).Design
Retrospective observational.Participants
RimNet and DiscNet were developed with 1208 and 11 536 optic disc photographs (ODPs), respectively. DDLSNet performance was evaluated on 120 ODPs from the RimNet test set, for which the DDLS scores were graded by clinicians. Reproducibility was evaluated on a group of 781 eyes, each with 2 ODPs taken within 4 years apart.Methods
Disc damage likelihood scale calculation requires estimation of optic disc size, provided by DiscNet (VGG19 network), and the minimum rim-to-disc ratio (mRDR) or absent rim width (ARW), provided by RimNet (InceptionV3/LinkNet segmentation model). To build RimNet's dataset, glaucoma specialists marked optic disc rim and cup boundaries on ODPs. The "ground truth" mRDR or ARW was calculated. For DiscNet's dataset, corresponding OCT images provided "ground truth" disc size. Optic disc photographs were split into 80/10/10 for training, validation, and testing, respectively, for RimNet and DiscNet. DDLSNet estimation was tested against manual grading of DDLS by clinicians with the average score used as "ground truth." Reproducibility of DDLSNet grading was evaluated by repeating DDLS estimation on a dataset of nonprogressing paired ODPs taken at separate times.Main outcome measures
The main outcome measure was a weighted kappa score between clinicians and the DDLSNet pipeline with agreement defined as ± 1 DDLS score difference.Results
RimNet achieved an mRDR mean absolute error (MAE) of 0.04 (± 0.03) and an ARW MAE of 48.9 (± 35.9) degrees when compared to clinician segmentations. DiscNet achieved 73% (95% confidence interval [CI]: 70%, 75%) classification accuracy. DDLSNet achieved an average weighted kappa agreement of 0.54 (95% CI: 0.40, 0.68) compared to clinicians. Average interclinician agreement was 0.52 (95% CI: 0.49, 0.56). Reproducibility testing demonstrated that 96% of ODP pairs had a difference of ≤ 1 DDLS score.Conclusions
DDLSNet achieved moderate agreement with clinicians for DDLS grading. This novel approach illustrates the feasibility of automated ODP grading for assessing glaucoma severity. Further improvements may be achieved by increasing the number of incomplete rims sample size, expanding the hyperparameter search, and increasing the agreement of clinicians grading ODPs.