A.Z. Fazilat1, C. Brenac1, D. Kawamoto-Duran1, C.E. Berry1, P. Sunwoo1, K. Huang1, D.C. Wan1 1Stanford University, Plastic And Reconstructive Surgery/Surgery/Stanford University School Of Medicine, Palo Alto, CA, USA
Introduction: As artificial intelligence (AI) and intelligent chatbots become more integrated into healthcare, it is crucial to assess the quality of the images and information they generate, particularly since these resources are frequently consulted by patients. This study aims to assess the quality of hand surgery images generated by Gemini and French patient-facing information provided by ChatGPT, comparing them to reference materials reviewed by both surgeons and non-medical individuals.
Methods: Five hand surgeons and twenty-six non-medical individuals compared medical images from Gemini and patient-facing information from ChatGPT against those provided by SOFCPRE/SFCM for carpal tunnel syndrome, Dupuytren's disease, and synovial cyst. Surgeons assessed medical images for comprehensiveness, relevance, accuracy, and overall preference, while non-medical individuals only chose their preferred source. Surgeons also evaluated patient-facing information for comprehensiveness, clarity, accuracy, and overall preference, whereas non-medical individuals rated clarity and selected their preferred source. Readability tests were utilized and statistical analysis was performed using paired t-test for Likert scale results.
Results: SOFCPRE/SFCM images were significantly favored in comprehensiveness (p<0.0001) and relevance (p<0.001) according to hand surgeons (Figure 1A). SOFCPRE/SFCM patient-facing information was rated higher in comprehensiveness (p<0.0001) and clarity (p<0.05) by hand surgeons (Figure 1B). Interestingly, non-medical individuals scored clarity of ChatGPT patient-facing information higher than SOFCPRE/SFCM (p<0.0001) (Figure 1B). More inaccuracies were identified in Gemini and ChatGPT materials. Hand surgeons preferred Gemini images and ChatGPT information 6.67% and 36.37% of the time, respectively, while non-medical individuals preferred Gemini images and ChatGPT information 34.62% and 68.06% of the time, respectively. Both sources produced information exceeding recommended readability levels for patient comprehension.
Conclusion: The materials from SOFCPRE/SFCM were preferred by hand surgeons for their superior comprehensiveness, relevance, clarity, and accuracy. In contrast, non-medical individuals showed a preference for AI-generated content, particularly the patient-facing information from ChatGPT. This divergence in preferences highlights the need to refine AI-generated medical content to ensure it meets the rigorous standards of healthcare professionals while still being accessible and engaging for patients. Future efforts should focus on improving the comprehensiveness, relevance, clarity, accuracy, and readability of AI-generated materials to better align with patient needs.