97.15 De novo generation of patient education materials using ChatGPT: Is the technology there yet?

I. E. Ellison1, W. M. Oslock1, M. T. Thirumalai1, M. Rubyan2, B. A. Jones1, R. Hollis1, D. I. Chu1  1University Of Alabama at Birmingham, Division Of Gastrointestinal Surgery, Department Of Surgery, Birmingham, Alabama, USA 2University Of Michigan, School Of Public Health, Ann Arbor, MI, USA

Introduction: Low health literacy contributes to surgical disparities. If patients struggle to understand educational materials, it can lead to unintentional non-adherence and poor outcomes. Previous studies show that improved patient education can improve outcomes and reduce disparities, though such efforts can be labor intensive. ChatGPT may serve as an accessible method to generate educational resources that meet the National Institutes of Health recommended sixth-grade reading level. We therefore aimed to generate de novo education materials using ChatGPT and compare readability with existing materials.

Methods: Existing colorectal surgery educational materials (N=52) for three categories (preoperative, postoperative, and ostomy related) were gathered from a large academic institution. After several iterations, the following prompt was utilized to generate de novo materials from ChatGPT version 4.0 for each topic: “Please give me patient education information about [keyword] that is health literate and at a sixth grade reading level.” Education materials on the same topic were compared utilizing three readability scores. Flesch Kincaid Reading Ease (FKRE) and Flesch Kincaid Grade Level (FKGL) scores are based on number of words/sentence and syllables/word while the Simplified Measure of Gobbledygook (SMOG) score utilizes number of words with ≥ three syllables. Unpaired T-tests were used to compare mean scores.

Results:On Flesh Kincaid readability scoring, few baseline education materials (1 preoperative, 2 ostomy related) and no ChatGPT de novo materials met grade level readability recommendations (Figure 1). ChatGPT generated preoperative materials had a median FKRE score of 39.7 (college level), FKGL score 11.4 (high school) and SMOG score 12.7 (7th grade), with FKRE scores statistically better for baseline education materials (median 55.4, p=0.0004). Postoperative materials by ChatGPT had a median FKRE score of 44.6 (college level), FKGL score 9.7 (high school) and SMOG score 11.3 (6th grade), with baseline materials all scoring statistically better in terms of readability at 63.7, 7.4, and 10 respectively (p<0.002). Ostomy specific materials by ChatGPT had a median FKRE score of 43.9 (college level), FKGL score 9.9 (high school) and SMOG score 11.3 (6th grade), with baseline materials again all scoring statistically better in terms of readability at 61.9, 7.6, and 10.3 respectively (p-values<0.006).

Conclusion:Education materials generated de novo with ChatGPT were similar to or worse than existing materials in terms of readability. Further investigation is needed into if and how AIs like ChatGPT can be utilized to improve the quality of educational resources without exacerbating existing inequities.