Interobserver agreement for detecting Hill-Sachs lesions on magnetic resonance imaging
Article information
Abstract
Background
Our aim is to determine the interobserver reliability for surgeons to detect Hill-Sachs lesions on magnetic resonance imaging (MRI), the certainty of judgement, and the effects of surgeon characteristics on agreement.
Methods
Twenty-nine patients with Hill-Sachs lesions or other lesions with a similar appearance on MRIs were presented to 20 surgeons without any patient characteristics. The surgeons answered questions on the presence of Hill-Sachs lesions and the certainty of diagnosis. Interobserver agreement was assessed using the Fleiss’ kappa (κ) and percentage of agreement. Agreement between surgeons was compared using a technique similar to the pairwise t-test for means, based on large-sample linear approximation of Fleiss' kappa, with Bonferroni correction.
Results
The agreement between surgeons in detecting Hill-Sachs lesions on MRI was fair (69% agreement; κ, 0.304; p<0.001). In 84% of the cases, surgeons were certain or highly certain about the presence of a Hill-Sachs lesion.
Conclusions
Although surgeons reported high levels of certainty for their ability to detect Hill-Sachs lesions, there was only a fair amount of agreement between surgeons in detecting Hill-Sachs lesions on MRI. This indicates that clear criteria for defining Hill-Sachs lesions are lacking, which hampers accurate diagnosis and can compromise treatment.
INTRODUCTION
During anterior shoulder dislocation, the head of the humerus can be pressed against the antero-inferior part of the glenoid rim and cause an impression fracture of the posterior superior lateral humeral head, known as a Hill-Sachs lesion [1]. The incidence of these Hill-Sachs lesions is reported to be between 40% and 90% for patients with anterior instability and could be as high as 100% for patients with recurrent dislocation [2]. Furthermore, humeral bone loss associated with a Hill-Sachs lesion can increase the risk of recurrent dislocation depending on the size and location of the lesion [1]. Treatment algorithms, such as the instability severity index score and glenoid track instability management score, have been developed to assess whether instability could be treated with a soft-tissue procedure or a bony procedure [3]. In these treatment algorithms a more aggressive approach is recommended based on the presence of factors that result in a higher recurrent instability rate, and a Hill-Sachs lesion is one of these factors. Since the presence of a Hill-Sachs lesion is important for determining treatment, it is important that healthcare providers agree on the presence of a Hill-Sachs lesion.
A Hill-Sachs lesion can be detected on radiographic imaging, but computed tomography (CT) and magnetic resonance imaging (MRI) are more sensitive [4,5]. Traditionally, CT scans were obtained to assess humeral and glenoid bone loss. In contrast to CT scans, MRI does not expose patients to radiation and assessment of the soft-tissue can be more accurate [6]. Therefore, MRI is the preferred imaging modality by orthopedic shoulder surgeons [7]. Saqib et al. [8] recently reported high sensitivity and specificity of magnetic resonance arthrography reviewed by experienced radiologists in detecting Hill-Sachs lesions compared to arthroscopy by one single surgeon. Although the accuracy of MRI to detect Hill-Sachs lesions is documented (Table 1) [8-17] insight into the reliability is limited.
This gap in the literature is critical, as discordant diagnoses by healthcare professionals can have detrimental impacts on patient care and recovery. Consequently, if reliability is low, healthcare providers do not agree on the presence of Hill-Sachs lesions. That means that patients with (and without) Hill-Sachs lesions can be diagnosed and treated differently by surgeon. Additionally, the incidence of Hill-Sachs lesions in the literature can vary, largely due to differences in clinical judgement. We are interested specifically in treating surgeon radiological judgement rather than the expert radiologist assessment judgement because surgeons always assess MRIs before discussing treatment options with the patient.
Halma et al. [18] reported fair interobserver agreement in surgeons and radiologists that assessed Hill-Sachs lesions compared to only 3 of 50 MRIs that included a Hill-Sachs lesion in the present study. Therefore, concrete conclusions on the reliability of detecting Hill-Sachs lesions could not be made. Beason et al. [19] evaluated interobserver agreement for detecting Hill-Sachs lesions among shoulder/sports medicine fellowship-trained orthopedic surgeons based only on coronal and axial T2-weighted MRI series. However, the surgeon’s level of expertise was not taken into account, and the overall agreement was fair. van Grinsven et al. [20] has assessed the agreement between radiologists and orthopedic surgeons for instability-related shoulder lesions on MRI, although the study did not report on the number of Hill-Sachs lesions in the population. Furthermore, they reported the agreement for all instability-related shoulder lesions without specifying the agreement for Hill-Sachs lesions.
This is the fourth study on this important topic, and we aimed to provide further insight into the role of MRI as a diagnostic instruments that can be used by surgeons. Specifically, we aimed to determine: (1) the interobserver reliability for surgeons to detect Hill-Sachs lesions on MRI, (2) the certainty of surgeons regarding their judgement, and (3) the effects of surgeon characteristics on agreement. To achieve this, we incorporated results from a substantially sized group of surgeons with varying levels of expertise to assess multiple MRIs with and without Hill-Sachs lesions and with no additional patient characteristics for context. We hypothesized that agreement would be fair, certainty would be high, and agreement would increase with corresponding increase in level of expertise.
METHODS
This study has been approved by the Institutional Review Board of the OLVG Hospital (IRB No. WO 16.052).
Patients
Our hospital database was screened for available shoulder MRIs of patients with shoulder instability based on diagnosis codes. The medical records of these patients were manually screened by two researchers (HA and AS) for MRIs with Hill-Sachs lesions (n=19) or other defects with a similar appearance (n=10). These other defects were visible at the typical location for a Hill-Sachs lesion, but were not a Hill-Sachs lesion as reported by the musculoskeletal radiologist. Such lesions included bone cyst, erosion of cartilage, small grooves, or the bare area of the humeral head [21]. The majority of MRIs was performed without intra-articular contrast, and the Hill-Sachs lesions varied in size (Fig. 1). Proton density turbo spin echo MRIs were performed with a Siemens Magnetom Aera device (Siemens Healthineers, Erlangen, Germany). All MRIs were performed with the same MRI device and using the same protocol, positioning, and slice thickness.
Methods and Assessment
The MRI results were uploaded to a secure online survey platform (http://www.shoulderelbowcenter.com/) offering additional tools to perform measurements including lengths, angles, multiplanar reconstruction, and areas of surfaces. Experienced orthopedic surgeons with a specialization in shoulder pathology were invited to assess the MRIs and answer two questions based on the images: whether there was a Hill-Sachs lesion (yes/no) and how certain they were about the presence of a Hill-Sachs lesion (absolutely certain/certain/some doubts/very uncertain). General information about the assessing surgeons included the geographical location of their practice, years of clinical experience, scope of clinical interest, and whether they were involved in resident or fellowship training.
We did not provide any patient characteristics to isolate and assess the role of the MRI, which is just one of the available diagnostic tools. Because age, sex, and history of recurrent instability can predispose patients toward a Hill-Sachs or other diagnosis in regular clinical practice, not providing this information allowed assessment of the research question based purely on MRI.
Statistical Analysis
Sample size was based on expert opinion, numbers of MRIs and respondents in previous studies, [19,20,22], and feasibility in terms of the time needed to complete the survey for the set of MRIs. All analyses were performed with Stata ver. 14 (StataCorp., College Station, TX, USA). Fleiss’ Kappas were compared using the STATA package Kappaetc [23].
The interobserver variability was determined using Fleiss’ Kappa, a statistical measure for assessing agreement of a fixed number of more than two observers. The kappa (κ) value is interpreted as poor (<0 points), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1) agreement. The overall kappa values were calculated for each MRI and indicated the extent to which surgeons agreed on the presence or absence of a Hill-Sachs lesion. All surgeon characteristics were presented in absolute numbers and percentages, and surgeons were grouped according to characteristics. A technique similar to the classical pairwise t-test for means, based on a large-sample linear approximation of Fleiss' kappa, was used to test differences in interobserver agreement [24]. For clarity, we also presented the percentage of (observed) agreement, calculated as the average agreement between all possible pairs of r raters [23]. Statistical significance was set at p<0.05. When comparing three groups, we applied the Bonferroni correction. For each MRI, the overall certainty was calculated by dividing the total numbers for each response that were given as absolutely certain, certain, some doubts, or very uncertain for all the questions by the total number of surgeons.
RESULTS
Surgeon Characteristics
We invited 106 surgeons in total, and 20 surgeons completed the survey (19%). The majority was employed in Europe and specialized in shoulder and elbow surgery. Among the three surgeons with another specialty, two specialized in orthopedic traumatology.
Interobserver Agreement for Presence of Hill-Sachs Lesions
The observer answers are summarized in Table 2, and there were only two cases with complete agreement between all surgeons. For eight of the 29 MRIs (28%), the responses were almost randomly distributed; 40%–60% of the surgeons identified a Hill Sachs lesion, while the other 60%–40% did not. Together, all answers resulted in fair overall interobserver agreement for presence of a Hill-Sachs lesion (69% agreement; κ=0.304; p<0.001).
Certainty
Reponses for evaluating the presence of a Hill-Sachs lesion indicated that 32% of the answers were very certain, 52% were certain, 16% had some doubts, and 0% were very uncertain.
Effect of Characteristics on Interobserver Variability
Surgeons with 11–20 years of experience had better agreement than surgeons with 6–10 years of experience (11–20 years: 90% agreement; κ=0.703 vs. 6–10 years: 66% agreement; κ=0.235, p=0.005). Having 0–5 years of experience did not influence agreement in comparison with 6–10 years (71% agreement, κ=0.363 vs. 66% agreement, κ=0.235, p=0.046) or 11–20 years (71% agreement; κ=0.363 vs. 90% agreement; κ=0.703, p=0.05). Country of specialty, shoulder and elbow specialty, and involvement in resident or fellowship training did not affect the level of agreement within subgroups of surgeons, as detailed in Table 3.
DISCUSSION
This study showed fair interobserver reliability to detect Hill-Sachs lesions on MRI, indicating that MRI alone should be interpreted with caution in clinical decision making. Although the surgeons were mostly (84%) certain or very certain regarding their decision about the presence of a Hill-Sachs lesion, the degree of agreement between surgeons in detecting a Hill-Sachs lesion on MRI was only fair. In this sample of 20 surgeons, agreement was not affected consistently by surgeon’s country of specialty, years of experience, specialty, or fellowship training.
The fair agreement for the presence of Hill-Sachs lesions could be attributed to difference in interpretation of the transition zone between cartilage and bone. Lack of cartilage can have the same appearance as an impression fracture and could be mistaken for a Hill-Sachs lesion, or vice versa. Moreover, the articular surface of the humeral head is the smallest in the superior-posterior segment and is the typical location of a Hill-Sachs lesion [25]. The anatomical humeral groove could be mistaken for a Hill-Sachs lesion [26]. Furthermore, detecting a Hill-Sachs lesion is difficult, even when assessing on arthroscopic videos, even though arthroscopy is the gold standard. Sasyniuk et al. [27] reported that only 35% of the surgeons assessing videotapes of arthroscopic procedures agreed on the presence of a Hill-Sachs lesion. Additionally, a previous study showed fair agreement between radiologists and fair to poor agreement between radiologists and an orthopedic surgeon in detecting Hill-Sachs lesions [18]. However, the present study included only two radiologists and one orthopedic surgeon.
The fact that the two surgeons with 11–20 years of experience had better agreement when assessing the presence of a Hill-Sachs lesion supports the value of subspecialties. Our results show a slightly higher agreement between surgeons with less than 5 years of experience in comparison with those with of 6–10 years, but both agreements were fair with a difference of only 5%, which limits the clinical relevance of this finding. The fair agreement with high level of confidence about the presence of a Hill-Sachs lesion indicates that surgeons cannot rely on their personal sense of certainty for these types of diagnostic and treatment decisions.
We included a representative mix of MRIs that consisted of smaller and larger Hill-Sachs lesions as well as lesions that are similar in appearance to simulate the clinical setting. We agree that adding these cases of lesions with a similar appearance to a Hill-Sachs lesion likely limits agreement between surgeons, but deemed this inclusion an important parameter for adequately assessing agreement as these cases provided relevant simulations of the clinical population. There were cases in the set of MRIs that had varying agreement that ranged from bad to good, but the overall agreement was fair. We think that the overall agreement best represents the clinical setting that consists not only of cases wherein lesions are easily distinguished from each other.
There are some limitations for interpreting the results of this study. First, we only had a response rate of 19%, which could influence our data due to lack of generalizability to all surgeons. Second, we did not confirm the Hill-Sachs lesions by arthroscopy. However, the accuracy and correlation between the MRI and arthroscopic findings have been documented in previous studies [8,28]. Additionally, only 35% of the surgeons agreed on the presence of a Hill-Sachs lesion when assessing videotapes of arthroscopic procedures [27]. More importantly, MRI typically guides the decision for conservative or operative treatment. Therefore, it is important to reliably assess Hill Sachs lesions on MRI, prior to arthroscopic or other surgery. Given the lack of a true gold standard, we did not intend to standardize or confirm the presence or absence of the lesions, but instead provide evidence of a substantial lack of consensus, which needs to be addressed.
Another limitation is that we looked at years of experience of the surgeons and not at the volume of shoulder and elbow procedures they had performed. Years of experience might be biased due to young, subspecialized shoulder surgeons performing many more shoulder procedures than older surgeons who have a wider scope of interest. Finally, some of the MRIs were performed with intravascular contrast. To our knowledge, there is no known difference in assessing Hill-Sachs lesions between MRIs with and without contrast.
A strength of this study was that a widely used interobserver agreement method (kappa) was used to assess the degree of consensus between surgeons regarding the presence and treatment of Hill-Sachs lesions and was augmented with percentage of agreement, which is easier to interpret. Moreover, we assessed consensus based on MRIs, which are most commonly used to detect pathology that causes glenohumeral instability [7]. In addition, we deliberately withheld patient characteristics from the reviewers to isolate the role of MRI in detecting a Hill-Sachs lesion without confounding factors. Our findings of limited agreement support the need for international criteria and guidelines for diagnosing Hill Sachs lesions.
Future research could address the disagreements that arise by evaluating and defining the criteria for individual surgeons to use to diagnose Hill Sachs lesions. These criteria can be considered and included in guideline development. Furthermore, an important and trending topic is to evaluate the most reliable measurement for glenoid and humeral bone loss [29,30]. Finally, the interobserver agreement of surgeons or radiologists could be measured for other imaging techniques, such as CT scans. Although surgeons are highly confident in their ability to detect Hill-Sachs lesions, in the absence of patient characteristics, there is only fair agreement between surgeons for detecting Hill-Sachs lesions on MRI.
Acknowledgements
Shoulder and Elbow Center (collaborators): Gregory R. Waryasz; Matthijs R. Krijnen; Pierre Mansat; Sven A.F. Tulner; Christian M. Fortanier; Carola F. van Eck; Ruud P. van Hove; Christiaan J.A. van Bergen; John N. Trantalis; Paul Hoogervorst; Tjarco D.W. Alta; Guus J.M. Janus; Alexander van Tongel; Diederik J.W. Meijer; Ronald N. Wessel; Mark Schnetzke; John Cheung; Derek F.P. van Deurzen.
Notes
Financial support
None.
Conflict of interest
None.