- Odisho, Anobel Y;
- Bridge, Mark;
- Webb, Mitchell;
- Ameli, Niloufar;
- Eapen, Renu S;
- Stauf, Frank;
- Cowan, Janet E;
- Washington, Samuel L;
- Herlemann, Annika;
- Carroll, Peter R;
- Cooperberg, Matthew R
Purpose
Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches.Materials and methods
From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen's κ) between approaches assuming no gold standard.Results
There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0.Conclusion
We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.