UMMS Affiliation

Department of Population and Quantitative Health Sciences; Department of Medicine

Publication Date


Document Type



Artificial Intelligence and Robotics | Diagnosis | Health Information Technology | Nutritional and Metabolic Diseases


BACKGROUND: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.

OBJECTIVE: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients' secure messages.

METHODS: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data.

RESULTS: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.

CONCLUSIONS: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.


adverse event detection, drug-related side effects and adverse reactions, hypoglycemia, imbalanced data, natural language processing, secure messaging, supervised machine learning

Rights and Permissions

Copyright © Jinying Chen, John Lalor, Weisong Liu, Emily Druhl, Edgard Granillo, Varsha G Vimalananda, Hong Yu. Originally published in the Journal of Medical Internet Research (, 11.03.2019. This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.

DOI of Published Version



J Med Internet Res. 2019 Mar 11;21(3):e11990. doi: 10.2196/11990. Link to article on publisher's site

Journal/Book/Conference Title

Journal of medical Internet research

Related Resources

Link to Article in PubMed

PubMed ID


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.