S. Bheemireddy1, S.E. Leslie2, M.G. Higgins3, J.A. Durden3, M. Adams4, S. Greenseid5, L. McLemore6, G. Li9, R. Miles7, N. Taft8, S. Tevis3 1Albany Medical College, Albany, NY, USA 2University Of Colorado Denver, Adult And Child Center For Outcomes Research And Delivery Science (ACCORDS), Aurora, CO, USA 3University Of Colorado Denver, Department Of Surgery, Aurora, CO, USA 4University Of Houston, Department Of Psychological, Health, And Learning Sciences, Houston, TX, USA 5University Of Miami, Department Of Surgical Oncology, Miami, FL, USA 6University Of Colorado Denver, Department Of Pathology, Aurora, CO, USA 7Denver Health Medical Center, Department Of Radiology, Aurora, CO, USA 8Denver Health Medical Center, Department Of General Surgery, Aurora, CO, USA 9University Of Colorado Denver, Department Of Radiology, Aurora, CO, USA
Introduction:
The Cures Act allows patients to view their diagnostic reports before discussions with healthcare professionals (HPs). Breast pathology reports, traditionally intended for HPs, exceed the recommended 6th-grade reading level and ease of reading score (80-100) for patient-facing materials. Patients reviewing their pathology report may misinterpret the contents. Therefore, using artificial intelligence (AI) to simplify breast pathology reports may provide a tool to improve health literacy and patient comprehension of these reports. To our knowledge, there are no studies investigating the use of AI to simplify language in breast pathology reports. This study aims to assess the readability of these reports when modified by ChatGPT compared to original reports and to determine which prompt provides the most readable report.
Methods:
We compared the readability scores of 10 original deidentified patient breast pathology reports to versions simplified by ChatGPT-4.0. ChatGPT was requested to simplify each report using 3 different prompts [Table 1]. The original and ChatGPT-simplified reports were assessed for readability utilizing the Flesch-Kincaid Reading Level (FKRL) and ease of reading using the Flesch Reading Ease Score (FRES). An ANOVA test determined if there was a statistically significant difference between the output of each prompt, followed by a paired t-test to identify the prompt with the highest ranked FKRL and FRES. A paired t-test compared the simplified report from the highest-ranked prompt to the original report to assess if the prompt improved the two scores.
Results:
Compared to prompts 1 and 3, the report generated by prompt 2 demonstrated a statistically significant reduction in FKRL (prompt 2 vs prompt 1, P < 0.001; prompt 2 vs prompt 3, P < 0.001) and an increase in FRES (prompt 2 vs prompt 1, P < 0.001; prompt 2 vs prompt 3, P < 0.001). Compared to the original, prompt 2 improved both the readability (prompt 2 vs original, P < 0.001) and ease of reading scores (prompt 2 vs original, P < 0.001). See Table 1 for the scores of each report.
Conclusion:
Specifying a desired grade level when prompting ChatGPT to simplify breast pathology reports leads to improved readability and ease of reading compared to more generic prompts. This specification also scores significantly better than the original text, indicating it may provide patients with pathology reports that have increased readability. Although Prompt 2 increased the FRES more than other prompts the mean FRES was still below the target score of 80. Further work is necessary to assess the accuracy of these ChatGPT-simplified reports and develop prompts that improve the FRES to the recommended level for patient-facing material.