S.C. Hodges1, W.M. Oslock1, A.A. Harsono1, B. Brock1, L. Wood1, R.H. Hollis1, A. Abbas1, G. Hernandez-Marquez1, M. Rubyan2, D.I. Chu1 1University Of Alabama at Birmingham, Department Of Surgery, Birmingham, Alabama, USA 2University Of Michigan, School Of Public Health, Ann Arbor, MI, USA
Introduction: Artificial Intelligence (AI) and large language models (LLM) are increasingly used in clinical care. Opportunities exist to use AI/LLMs to improve patient education materials, but supporting data is lacking. Given this, we aimed to compare the understandability and actionability of materials generated with a large language model compared to existing materials.
Methods: Surgical education materials (N=52) from three categories (preoperative, postoperative, and ostomy-related care) at a large academic institution were compared with materials generated by a freely available large language model [DC1] (Gemini). Gemini was used with an optimized metric-based prompt: “Please give me patient education information about [topic], risks, expectations, and preparation that is health literate and at a sixth-grade reading level using short sentences and words with <3 syllables.” Three independent reviewers rated understandability and actionability using the U.S. Department of Health and Human Services Patient Education Materials Assessment Tool (PEMAT). The primary outcomes were understandability and actionability scores by PEMAT. Inter-rater reliability was calculated using Intraclass correlation coefficient (ICC). PEMAT scores were compared using paired t-tests between existing and Gemini-generated materials.
Results: A high degree of reliability was found between 3 raters with an excellent ICC of 0.98 (95% CI 0.97-0.98). Overall, existing [DC1] patient education materials scored higher (87.8%) than Gemini-generated materials (82.3%, p<0.01) in understandability. Common missing points in materials generated by Gemini were providing a summary, using visual cues to highlight key points, and using visual aids to make content more easily understood. On the contrary, Gemini surpassed existing materials in establishing purpose, using everyday language, defining medical terms, breaking information into sections, using informative headers, and presenting information in a logical sequence. Existing materials scored higher in actionability (86.6%) compared to Gemini-generated materials (54.5%, p<0.01). Gemini-generated materials failed to break down actions into explicit steps, provide tangible actionability tools, or use visual aids to guide actionability (Figure 1).
Conclusion: Education materials generated by an LLM were found to be less understandable and less actionable compared to existing materials. Currently, synthesis of education materials through LLMs may be too early to adopt. Future research is needed to better optimize LLMs for this purpose.