53.06 Machine Learning and Cloud Computing for Enhanced Surgical Risk Prediction

Z. C. Dietch1, K. C. Lichtendahl2, Y. Grushka-Cockayne2, J. M. Will2, R. S. Jones1, R. G. Sawyer1 1University Of Virginia,Department Of Surgery,Charlottesville, VA, USA 2University Of Virginia,Darden Graduate School Of Business,Charlottesville, VA, USA

Introduction: Accurate risk prediction enhances surgical care by impacting perioperative planning, informed consent, patient selection, and outcome measures. Advances in machine learning, computing, and data availability have led to superior predictions in many disciplines by automating the process of variable selection in the context of vast amounts of data. Based on wisdom-of-crowds literature, we hypothesized that an ensemble, or average, of machine-learning prediction models would outperform mortality predictions issued by the American College of Surgeons National Surgical Quality Improvement Program (ACSNSQIP).

Methods: Preoperative characteristics and mortality outcomes from the ACSNSQIP Participant Use Files (PUF) 2005-2013 were used to form mortality predictions. The data were split into training and testing sets. The free open-source R statistical software was utilized on Microsoft’s Azure cloud environment to parallel-process variable selection and mortality predictions. Two machine-learning algorithms—regularized logistic regression and random forest—were fit to the training set to generate probabilistic mortality predictions for each observation in the testing set. The two model predictions were averaged to form an ensemble prediction. The primary outcome was the improved accuracy of mortality predictions of the ensemble compared to those of the ACSNSQIP model. The Brier score, a measure of accuracy represented by the mean squared difference between the predicted probability and actual outcome for a set of individual observations, was used to evaluate performance. Significance was determined using the Amisano-Giacomini test.

Results: Models were trained on 1,847,818 records, and predictions were tested on 615,939 records. The regularized logistic regression selected 93 predictors, including 11 new variables engineered using ACSNQIP PUF data. The random forest utilized 86 selected predictors, including 17 newly engineered variables. Predictions of the ensemble outperformed ACSNSQIP predictions as measured by Brier scores where lower is better (0.009854 vs. 0.009904, p=0.0065), representing a mean improvement of 0.5% over ACSNSQIP predictions. A graphical representation of model calibration is presented (Figure).

Conclusion: Advances in machine learning and cloud computing have enabled rapid, robust, and affordable predictive analysis. Utilizing such capabilities, we generated a predictive model that outperformed the ACSNSQIP model. Despite lacking institution-specific data used by the ACSNSQIP in their model, we outperformed ACSNSQIP predictions using otherwise identical data. Because outcomes vary by institution, we expect further improvement by including institution-specific data in our model.