K.V. Lauer1, D. Jawara1, M. Venkatesh1, L.N. Stalter1, B.M. Hanlon1,2, M.M. Churpek3, C. Gehl4, A.N. Kothari5, L.M. Funk1,6 1University Of Wisconsin, Department Of Surgery, Madison, WI, USA 2University Of Wisconsin, Department Of Biostatistics And Medical Informatics, Madison, WI, USA 3University Of Wisconsin, Department Of Medicine, Madison, WI, USA 4Medical College Of Wisconsin, Milwaukee, WI, USA 5Medical College Of Wisconsin, Department Of Surgery, Milwaukee, WI, USA 6William S. Middleton Memorial VA, Department Of Surgery, Madison, WI, USA
Introduction: Obesity, defined as a body mass index (BMI) of ≥30 kg/m2, is a major public health concern in the United States. More than 40% of adults meet BMI criteria for obesity. Addressing obesity using behavioral treatment, medications, and/or bariatric surgery is effective for patients who utilize these evidence-based treatments, but they have not reversed the rise in obesity prevalence. Preventative approaches are essential to curbing the obesity epidemic. However, they are limited by an inability to accurately predict individuals at highest risk of weight gain. Our study objective was to develop accurate machine learning weight gain prediction models using the All of Us dataset. We hypothesized that machine learning algorithms, including elastic net logistic regression (EN) and XGBoost, would have higher performance in weight gain prediction with the inclusion of patient behavioral survey data.
Methods: Our study utilized the racially representative NIH-funded All of Us dataset. Adults age 18-70 years old with weight measurements two years apart between 2008 and 2022 were selected. Exclusion criteria included a history of cancer, bariatric surgery, or pregnancy during the study interval. Predictors used in the models included demographics, vital signs, laboratory results, comorbidities, and survey data (Alcohol Use Disorder Identification Test [AUDIT-C], Patient-Reported Outcomes Measurement Information System [PROMIS] physical and mental health scores). The primary outcome was ≥10% total body weight (TBW) gain at two years. EN and XGBoost machine learning models were developed with and without survey data. The data was split into a training sample (60%) and a testing sample (40%), and parameters were tuned using 10-fold cross validation. Model performance was compared using area under the receiver operating characteristic curves (AUCs).
Results: Our cohort consisted of 34,715 patients (mean [SD] age 50.9 [13.4] years); 45.7% White; 55.3% female). Over a two-year span, 10.4% of the cohort gained ≥10% TBW. AUCs for EN and XGBoost models were 0.663 [95% confidence interval 0.648-0.677] and 0.716 [0.702-0.729], respectively. Incorporation of survey data did not improve performance, with AUCs of 0.667 [0.653-0.682] and 0.715 [0.702-0.729].
Conclusion: Incorporation of AUDIT-C and PROMIS physical and mental health scores did not improve the performance of EN and XGBoost machine learning weight gain prediction models. The addition of other All of Us variables, including genomic data, may be informative in future studies.