A. W. Trickey1, Q. Ding1, A. Sox-Harris1,2 1Stanford University,Stanford-Surgery Policy Improvement Research And Education (S-SPIRE) Center, Department Of Surgery,Palo Alto, CA, USA 2VA Palo Alto Healthcare Systems,Center For Innovation To Implementation,Palo Alto, CA, USA
Introduction: Surgical outcome prediction models could be useful for many aspects of surgical care, including informed consent, shared decision making, preoperative patient optimization, and risk-adjusted quality measures. A set of parsimonious universal surgical morbidity and mortality prediction models was recently developed using the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP). The Surgical Risk Preoperative Assessment System (SURPAS) models demonstrated excellent overall and specialty-specific performance; however, it is unknown how the models perform for more narrow subsets of procedures. We aimed to internally validate the SURPAS models’ performance for elective total joint replacement (TJR) cases from the 2012 ACS-NSQIP.
Methods: The 2012 ACS-NSQIP PUF database was queried for patients who underwent elective total hip or total knee arthroplasty. Mirroring the methods originally used to develop and validate the SURPAS models, outcomes included 30-day postoperative mortality and overall morbidity, and occurrence of one or more of 18 postoperative complications. Complications were further analyzed in 6 clusters: pulmonary, infectious, cardiac/transfusion, renal, venous, and neurological. We calculated predicted probabilities of experiencing postoperative mortality, overall morbidity, and complication clusters by applying coefficients and patient factors from the published risk prediction models. Discrimination, the model’s ability to predict the occurrence of mortality, morbidity or complication cluster, was assessed by C-index. Calibration, the alignment of predicted versus observed probabilities, was assessed by Hosmer-Lemeshow calibration decile plots and the associated ten-group chi-square values.
Results: Overall 30-day postoperative mortality for the TJR procedures was 0.14%, substantially lower than the 1.4% mortality rate in the original development dataset. The calculated TJR model C-indexes ranged from 0.56 (95% CI: 0.53-0.59) for venous thromboembolism to 0.82 (95% CI: 0.76-0.88) for mortality (Figure). All TJR procedure C-index estimates and confidence intervals were lower than those reported in the original development study. Calibration decile plots for all models revealed substantial differences between observed and expected event rates, and Hosmer-Lemeshow tests were highly significant (p<0.001), indicating poor calibration.
Conclusion: The results suggest that the universal SURPAS surgical risk models have poor accuracy for TJR procedures. Given the substantial variation in patient populations and outcomes across numerous surgical procedures, universal perioperative risk calculators may not produce accurate and reliable results for specific procedures.