A. Alipour1, O. Kwon1,2, N. Le1, R. Geoghegan1, E. Aguayo1,2, A. Tillou1,3, J. Wu1,3, P. Benharash1,3 1Center for Advanced Surgical & Interventional Technology (CASIT), David Geffen School Of Medicine, University Of California At Los Angeles, Los Angeles, CA, USA 2Department Of Surgery, Los Angeles County Harbor-UCLA Medical Center, Torrance, CA, USA 3Department Of Surgery, David Geffen School Of Medicine, University Of California At Los Angeles, Los Angeles, CA, USA
Introduction:
Accurate and objective assessment of operative skills is essential for improving training paradigms, patient safety, and quality of surgery. Recent advances in machine learning (ML) have facilitated automated assessment of skills, particularly in minimally-invasive and robotic operations However, application of ML in open surgery is nascent. In the present study, we aim to bridge this gap by developing a novel ML model to evaluate the technical performance of surgical video recordings.
Methods:
A publicly available dataset, comprising 314 videos of subjects performing open surgery suturing, was utilized for this study. Each video is approximately 5 minutes, taken at 30 frames per second. The dataset includes a global rating score (GBS) for each video, categorizing individuals into three classes: novice (n=119), intermediate (n=79), and proficient (n=116). Hybrid Convolutional Neural Network and Long-Short-Term-Memory (LSTM) networks were employed to train the video classifier model (Figure 1A). ResNet50, an image classification model, served as a spatial feature extractor to perform non-linear transformations. Initially, the model was fine-tuned on a subset of frames from the dataset. Subsequently, LSTM networks captured the long-term temporal dependencies between feature maps (frames). This network primarily focused on selectively retaining and discarding both significant and insignificant changes in frame sets that capture the subject’s movements. To assess the present model’s harmonic performance, the class-wise F1 score was measured. The F1 score is a performance metric that provides an objective and unbiased analysis of the model.
Results:
The architecture of our model is illustrated in Figure 1A. Across the entire dataset analyzed, the model achieved an average F1 score of 80.1% in determining the performance level of each subject, outperforming previous models (Figure 1B). Additionally, the model classified performance with up to 90.1% accuracy for the novice group, 65.7% for the intermediate group, and 86.3% for the proficient group. Although the model predicted the intermediate cohort with lower accuracy than other skill levels, it outperformed other models in this group by at least 10% (Figure 1B). The present model classified each video into appropriate skill levels at an estimated time of 10.2±0.4 seconds.
Conclusions:
Our ML model provides a robust framework for skill assessment in open surgery. The application of machine learning in clinical practice should be considered to evaluate surgeons’ skills and help improve training and outcomes.