11.04 Identification of Postoperative Complications Using Electronic Health Records and Machine Learning

K. Colborn1, M. Bronsert4,5, A. B. Singh5, K. Hammermeister3,4,5, W. G. Henderson1,4,5, R. A. Meguid2,4,5  1University Of Colorado Denver,Biostatistics And Informatics,Aurora, CO, USA 2University Of Colorado Denver,Surgery,Aurora, CO, USA 3University Of Colorado Denver,Cardiology,Aurora, CO, USA 4University Of Colorado Denver,Adult And Child Consortium For Health Outcomes Research And Delivery Science,Aurora, CO, USA 5University Of Colorado Denver,Surgical Outcomes And Applied Research,Aurora, CO, USA

Introduction: Population ascertainment of postoperative complications is time-consuming and expensive, as it often requires manual chart review. Using the American College of Surgeons National Surgical Quality Improvement Program (NSQIP) complication status of patients who underwent an operation at the University of Colorado Hospital, we sought to develop an algorithm for identifying patients with one or more complications using data from the electronic health record (EHR) and machine learning methodologies.

Methods:  Data were split into training (operations occurring between 2013-2015) and test (operations in 2016) sets. A binomial generalized linear model with an elastic-net penalty was used to fit the model and carry out selection of variables. Elastic-net penalized regression was used because it handles high-dimensional data and correlated covariates well. International classification of disease codes (ICD-9 & ICD-10), common procedural terminology (CPT) codes, medications, and CPT-specific complication event rate (a value indicating the complication rate for a given CPT code estimated from the national NSQIP dataset of >5 million patients) were included as predictors. The Youden’s J statistic was used to determine the optimal classification threshold

Results: Of 6,840 patients, 922 (13.5%) had at least one of the 18 complications tracked by NSQIP. Exactly 838 variables were initially included in the model, of which 117 had nonzero coefficients; 30 of these were ICD-9/-10 codes, 53 were CPT codes, 33 were medications and one was the CPT-specific complication event rate. The model achieved 86% specificity, 79% sensitivity, 96% negative predictive value, 46% positive predictive value, and an area under the receiver operating characteristic curve of 0.90 using a decision threshold of 0.12.

Conclusion: Using machine learning and NSQIP outcomes data, we found that a model with 117 predictors from the EHR identified complications well at our institution. This model can be used to scale-up complication surveillance beyond the limited NSQIP sampling for use at individual hospitals or entire health systems, or to estimate the impact of large-scale interventions on postoperative complication rates.