49.05 Reliability of a Rubric to Rate the Quality of Pre-Operative Goal Clarification Notes

C.E. Kern1, M.P. Cribbin1, K.M. Piazza4, P.G. Whiteside4, L.R. Pelcher4, A.D. Peeples4, O.K. Goodman4, C.F. Pascal4, D.E. Hall1,2,4,5,6  1University Of Pittsburg, Department Of General Surgery, Pittsburgh, PA, USA 2VA Pittsburgh Healthcare System, Department Of General Surgery, Pittsburgh, PA, USA 4Department of Veterans Affairs, Center For Health Equity Research And Promotion (CHERP), Philadelphia, PA, USA 5University of PIttsburgh Medical Center, Wolff Center, Pittsburgh, PA, USA 6VA Pittsburgh Healthcare System, Geriatric Research Education And Clinical Center, Pittsburgh, PA, USA

Introduction:  Prior work has demonstrated that a rating rubric effectively identifies quality differences across 4 key domains [(1) prognosis, (2) treatment options, (3) goals and values, and (4) justification of a preferred treatment option] in preoperative goal clarification notes. This investigation aims to establish the interrater reliability (IRR) of the rubric prior to application in a quality improvement process of audit and feedback.

 

Methods:  Using a sample of goal clarification notes generated in a quality improvement program in multiple surgical specialties across 5 Veterans Administration Medical Centers, 4 raters applied the rubric iteratively to small development samples of 7 notes. Each of the 4 domains was rated on a 3-level ordinal scale (e.g. high, low, missing). The overall quality of the note was rated as poor, intermediate, or high. The team of 4 raters met iteratively to discuss and reach consensus where there was disagreement to adapt the rubric for greater rating reliability. IRR was calculated for the 4 raters with percent agreement. Once a threshold of 70% agreement across 4 raters was achieved, a validation sample of 44 notes was rated by 2 raters who rated all notes and met intermittently to reach consensus on disagreement and optimize consistency in application. IRR was calculated with Cohen’s kappa for the 2 raters with a goal of “substantial” reliability (e.g., kappa >0.61-0.80).

 

Results: The rubric was applied to a total of 58 notes (33% general surgery, 26% urology, 11% vascular, 11% orthopedic, 11% other [ENT, transplant, cardiac, thoracic]). Rubric development required 2 cycles of iterative coding (14 notes total) to achieve 76.8% agreement across 4 raters. Subsequently, 44 notes were rated by only 2 raters who met twice to adjudicate differences and establish a final consensus rating. Kappa between raters for each domain are displayed in Table 1. The pooled kappa across all domains was 0.78 representing the upper range of substantial agreement. Across individual domains, kappa ranged from 0.63 to 0.81 and agreement ranged from 81.4% to 90.5% with disagreement most pronounced for prognosis, goals and values, and treatment options.

 

Conclusion: Iterative development and training in rubric application leads to substantial IRR such that trained raters could independently apply the rating to a larger sample of notes to provide audit and feedback in a process of quality improvement. The differences in disagreement between domains suggest the opportunity for further rubric development and rater training.