T. Araji1, A. Brooks1 1Hospital Of The University Of Pennsylvania, Department Of Surgery, Philadelphia, PA, USA
Introduction: Despite its recent prominence, the role of the Artificial Intelligence (AI) Chatbot “ChatGPT” has not been explored yet in the field of medical student education in surgery. The aim of this study is to assess how ChatGPT compares to Google search in influencing the medical education of third year medical students during their surgery clerkships.
Methods: We conducted a crossover study where medical students on their surgery clerkship were asked to complete two standardized assessments pertaining to general topics in surgery before and after they used either Google search or ChatGPT. The group that used Google search for the first assessment then used ChatGPT for the second assessment, and vice versa. Participants completed a post-assessment survey about their experiences with Google versus ChatGPT as well as the feasibility of ChatGPT and their willingness to use it for surgery related topics. Descriptive statistics and paired t-tests were used to compare outcomes between the two groups. All analyses were conducted using GraphPad Prism 9. Statistical significance was defined at the α=0.05 level.
Results: 19 third-year medical students particpated in our study. Baseline (pre-intervention) performance of participants on both quizzes did not differ between the Google search and ChatGPT groups (p=0.728). Students overall performed better post-intervention and the difference in test scores was statistically significant for both the Google group (p<0.001) and the ChatGPT group (p=0.01). The mean percent increase in test scores pre- and post- intervention was higher in the Google group at 11% versus 10% in the ChatGPT group, but this difference was not statistically significant (p=0.87). Similarly, there was no statistically significant difference in post-intervention scores on both assessments between the two groups (p=0.508). Post-assessment surveys revealed that all students (100%) have known about ChatGPT before, and 47% have previously used it for various purposes. On a scale of 1 to 10 with 1 being the lowest and 10 being the highest, the feasibility of ChatGPT and its usefulness in finding answers were rated as 8.4 and 6.6 on average, respectively. When asked to rate the likelihood of using ChatGPT in their surgery rotation, the answers ranged between 1 to 3 (“Unlikely” 47%), 4-6 (“intermediate” 26%), and 7-10 (“likely” 26%).
Conclusion: ChatGPT was as effective as Google search in helping students learn about surgery-related topics and perform better on assessments. Despite its access and feasibility, many students are reluctant to use ChatGPT for learning purposes during their surgery clerkship. If AI is going to become a significant learning tool, the best use cases and reasons for students’ hesitation to use it need to be explored.