E. Mettetal1, G. Marshall1, A. Moncada Ortega1, G.A. Del Carmen1 1Albany Medical Center, Department Of Surgery, Albany, NY, USA
Introduction:
Ethical concerns are inherent to clinical practice, especially within the inpatient setting. While physicians are guided by ethical frameworks, these frameworks provide static feedback, and ethics consults are not always available. Large language models (LLMs), capable of exhibiting a wide range of adaptable reasoning skills, show promise as adjuncts in the real-world application of ethical frameworks. This study aimed to fine-tune an LLM on a diverse set of ethical scenarios to create a robust tool to augment physicians' ethical decision-making and to assess its inherent ethical preferences.
Methods:
We utilized Llama 3.1, Meta AI’s flagship open-source model, as it is known for its superior reasoning capabilities. To generate an appropriate volume for training data, OpenAI’s GPT-4o model was employed to create a variety of ethical scenarios which were subsequently exported into a training file. Each scenario was manually reviewed to ensure an equal representation of diverse ethical cases. Ethical judgments were assessed based on the four pillars of medical ethics: beneficence, non-maleficence, autonomy, and justice; we quantified the frequency of the model’s recommendations for and against operative management. The model was presented with a 35-year-old patient with a BMI of 55 and multiple comorbilities seeking a Roux-en-Y gastric bypass (RYGB) following unsuccessful non-operative interventions. The dilemma required evaluating the patient’s poor surgical candidacy, the benefits of operative management, optimization of hospital resources, and the patient’s preference and consent.
Results:
Our model was fine-tuned on all ethical data over 73 minutes. Across all iterations of the ethical scenario, our model overwhelmingly recommended proceeding with the RYGB (70%). The model’s remaining recommendations were equivocal, some of which did not provide any specific guidance (18%) and the remainder deferring a decision based on the patient’s informed consent for the operation (12%). When providing specific recommendations, the model most frequently cited patient autonomy (39%), beneficence (21%), justice (16%), and non-maleficence (13%).
Conclusion:
Our model overwhelmingly recommended operative intervention in our scenario, referring to patient autonomy most frequently in its decision-making. Notably, our model displayed a reticence to provide specific ethical guidelines. Our demonstration of a novel use-case for LLMs in the generation of ethical decision-making, displaying how artificial intelligence technologies align with status quo ethical frameworks. Future research should focus on curating ethicist-validated training datasets and deploying real-world examples to assess ethical alignment.