Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing
Departments of Population and Quantitative Health Sciences; Department of Radiology
Artificial Intelligence and Robotics | Diagnosis | Radiology
BACKGROUND AND PURPOSE: Communication gaps exist between radiologists and referring physicians in conveying diagnostic certainty. We aimed to explore deep learning-based bidirectional contextual language models for automatically assessing diagnostic certainty expressed in the radiology reports to facilitate the precision of communication.
MATERIALS AND METHODS: We randomly sampled 594 head MR imaging reports from an academic medical center. We asked 3 board-certified radiologists to read sentences from the Impression section and assign each sentence 1 of the 4 certainty categories: "Non-Definitive," "Definitive-Mild," "Definitive-Strong," "Other." Using the annotated 2352 sentences, we developed and validated a natural language-processing system based on the start-of-the-art bidirectional encoder representations from transformers (BERT), which can capture contextual uncertainty semantics beyond the lexicon level. Finally, we evaluated 3 BERT variant models and reported standard metrics including sensitivity, specificity, and area under the curve.
RESULTS: A kappa score of 0.74 was achieved for interannotator agreement on uncertainty interpretations among 3 radiologists. For the 3 BERT variant models, the biomedical variant (BioBERT) achieved the best macro-average area under the curve of 0.931 (compared with 0.928 for the BERT-base and 0.925 for the clinical variant [ClinicalBERT]) on the validation data. All 3 models yielded high macro-average specificity (93.13%-93.65%), while the BERT-base obtained the highest macro-average sensitivity of 79.46% (compared with 79.08% for BioBERT and 78.52% for ClinicalBERT). The BioBERT model showed great generalizability on the heldout test data with a macro-average sensitivity of 77.29%, specificity of 92.89%, and area under the curve of 0.93.
CONCLUSIONS: A deep transfer learning model can be developed to reliably assess the level of uncertainty communicated in a radiology report.
DOI of Published Version
Liu F, Zhou P, Baccei SJ, Masciocchi MJ, Amornsiripanitch N, Kiefe CI, Rosen MP. Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing. AJNR Am J Neuroradiol. 2021 Oct;42(10):1755-1761. doi: 10.3174/ajnr.A7241. Epub 2021 Aug 19. PMID: 34413062. Link to article on publisher's site
AJNR. American journal of neuroradiology
Liu F, Zhou P, Baccei SJ, Masciocchi MJ, Amornsiripanitch N, Kiefe CI, Rosen MP. (2021). Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing. Population and Quantitative Health Sciences Publications. https://doi.org/10.3174/ajnr.A7241. Retrieved from https://escholarship.umassmed.edu/qhs_pp/1407