Diagnostic accuracy of clinical tests to rule out elbow fracture: a systematic review
Article information
Abstract
Elbow traumas represent a relatively common condition in clinical practice. However, there is a lack of evidence regarding the most accurate tests for screening these potentially serious conditions and excluding elbow fractures. The purpose of this investigation was to analyze the literature concerning the diagnostic accuracy of clinical tests for the detection or exclusion of suspected elbow fractures. A systematic review was performed using the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines. Literature databases including PubMed, Cumulative Index to Nursing and Allied Health Literature, Diagnostic Test Accuracy, Cochrane Library the Web of Science, and ScienceDirect were searched for diagnostic accuracy studies of subjects with suspected traumatic elbow fracture investigating clinical tests compared to imaging reference tests. The risk of bias in each study was assessed independently by two reviewers using the Quality Assessment of Diagnostic Accuracy Studies 2 checklist. Twelve studies (4,485 patients) were included. Three different types of index tests were extracted. In adults, these tests were very sensitive, with values up to 98.6% (95% confidence interval [CI], 95.0%–99.8%). The specificity was very variable, ranging from 24.0% (95% CI, 19.0%–30.0%) to 69.4% (95% CI, 57.3%–79.5%). The applicability of these tests was very high, while overall studies showed a medium risk of bias. Elbow full range of motion test, elbow extension test, and elbow extension and point tenderness test appear to be useful in the presence of a negative test to exclude fracture in a majority of cases. The specificity of all tests, however, does not allow us to draw useful conclusions because there was a great variability of results obtained.
Level of evidence
IV.
INTRODUCTION
Elbow fractures represent 7% of all body fractures [1]. Extra-articular fractures of the elbow are typical of childhood (60% of cases occur in children) [2,3], while articular fractures are more frequent in those >50 years of age and have a low incidence (0.09% of total fractures). Cases peak between the ages of 12–19 years, usually in boys, and those aged ≥80 years, characteristically in women. In young adults, the fractures are typically caused by high-energy injuries, such as motor vehicular collisions, falls from a height, sports, industrial accidents, and firearms. In contrast, >60% of distal humeral fractures in the elderly occur from low-energy injuries, such as a fall from a standing height [4]. In general, for all joints, the most common signs of fracture are hemarthrosis, swelling, and loss of mobility. To the best of our knowledge, there have been no studies evaluating the diagnostic accuracy of X-ray in the identification of elbow fractures, although this type of instrumental investigation is used, in 95% of the cases, as an elite method for diagnosis. In situations of clinical doubt, or when the X-ray images are not clear, a diagnosis is made with computerized tomography or magnetic resonance imaging [5]. In recent years, studies have begun to consider the diagnostic accuracy of ultrasonography compared to other instrumental investigations [6], reporting sensitivity and specificity values of >90% in children following elbow trauma. However, there is a low rate of positive radiography findings when assessing for suspected extremity fractures as evidenced by many studies, in which only 50% of patients with upper-extremity injuries [7] and 15% of patients with ankle injuries [8-10] had documented fractures on X-ray. This has led to the development and validation of clinical decision rules to safely reduce radiographic imaging for suspected lower-extremity fractures [11,12]. These rules are usually followed by clinicians to refer or not the patient toward an in-depth instrumental diagnostic exam in the emergency department. On the other hand, to date, there are still few clinical trials or clusters available that were built and designed to be able to “rule out” patients with suspected elbow fractures. In the last few years, several studies [13,14] have investigated the use of some elbow physical tests, which could help clinicians to make a correct decision in cases of suspected bone fractures without resorting to additional diagnostic instrumental investigations. To be clinically useful, a diagnostic test must be valid, reliable, safe, and simple. However, to our knowledge, no review has yet examined all the available literature on the diagnostic accuracy of clinical testing for detecting elbow fractures. The only study with a similar scope was the review by Joshi et al. [15] in which the authors wanted to provide a global overview of the different ways of diagnosing upper-extremity fractures, also through the use of the ultrasound. Therefore, the purpose of the present review was specifically to analyze the literature considering the diagnostic accuracy of these recently proposed clinical tests for the detection or exclusion of suspected elbow fractures.
METHODS
This systematic review was conducted and reported according to the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement guidelines. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting systematic reviews and renders the results from systematic reviews of diagnostic test accuracy studies more useful [16]. Institutional review board approval and informed consent from patients was not required for this systematic review.
Registration
The study was registered in April 2020 (CRD42020176511) with the International Prospective Register of Systematic Reviews, an international database of prospectively registered systematic reviews in health and social care.
Eligibility Criteria
In the phase of planning a review protocol, the inclusion and exclusion criteria were defined a priori for the selection of the studies. Two reviewers (GDM and PC) applied all criteria independently to the full text of articles that passed the initial screening phase. When a disagreement arose, a third author (GB) was consulted to discuss and solve the conflict. Studies, to be eligible, had to satisfy the following inclusion criteria: (1) enrolled patients presented with a suspected elbow fracture, defined as disruption of the bone tissue of the proximal epiphysis of radius and/or proximal epiphysis of the ulna and/or distal epiphysis of the humerus following a trauma; (2) the study investigated diagnostic accuracy without any limitation relating to the language of the report or the publication date; (3) the results of ≥1 clinical tests were compared with an acceptable reference standard (X-ray, computed tomography, magnetic resonance imaging); and (4) the study had to report measures of diagnostic accuracy (sensitivity, specificity, positive likelihood ratio, negative likelihood ratio) or to allow their calculation. Articles were excluded if (1) the index test was performed with complex, specific, and technological equipment not easily applicable in daily clinical practice (e.g., dynamometers, electrogoniometers); (2) the index test was performed on a cadaver, under anesthesia, or during or after surgery; and (3) the study included patients who had severe polytrauma and/or temporary loss of consciousness that did not allow investigators to execute the index test.
Information Sources
A literature search was conducted for diagnostic accuracy studies in PubMed, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Scopus, Diagnostic Test Accuracy (DiTA), ScienceDirect, the Web of Science, and the Cochrane Library databases by two reviewers acting separately. The search strategy (Supplementary Material 1) has been shared by all reviewers. We also searched for papers in Open SIGLE, Google, and Google Scholar to find grey literature [17]. In the first step, duplicate articles were excluded with the use of the software Zotero (Corporation for Digital Scholarship, Vienna, VA, USA) [18]; then, the articles, consistent with the review question, were screened by two authors (GDM and PC) independently, by reading titles, abstracts, and (if necessary) full-text versions. In the second step, two reviewers (GDM and PC) independently selected studies based on agreed criteria. When a disagreement arose, a third author (GB) was consulted to discuss and solve the conflict.
Risk of Bias
All studies included in the review were evaluated and scored with the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [19] (Supplementary Material 2) by two reviewers (GDM and PC) independently. A third author (GB) intervened in the case of any disagreement regarding the score assigned to the individual items. The QUADAS-2 tool is designed to assess the quality of primary diagnostic accuracy studies and consists of four key domains that discuss patient selection, index test, reference standard, and flow and timing. Each domain is assessed in terms of the risk of bias, and the first three domains are also assessed in terms of concerns about applicability. Signaling questions are included to help judge the risk of bias; these questions flag aspects of study design related to the potential for bias and aim to help reviewers judge the risk of bias. The QUADAS-2 tool can be tailored to each review by adding or omitting signaling questions. In each item, the risk of bias could be classified as “low,” “high,” or “unclear.” If a study is judged to have a “high” or “unclear” risk of bias in ≥1 domains, then it may be judged “at risk of bias” or as having “concerns regarding applicability.”
Data Extraction
Two reviewers (GDM and PC) independently extracted information and data regarding all included studies, including first author, publication year, study type, study population, and setting; index test and reference test; diagnostic criteria; prevalence; and the number of true positives, false positives, false negatives, and true negatives for recalculation or calculation, when not provided, of sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio (+LR), and negative likelihood ratio (−LR). Extracted data were processed by two reviewers (GDM and PC) using Microsoft Excel (Microsoft Corp., Redmond, WA, USA).
RESULTS
Through research on biomedical databases from September 2020 to May 2021, 1,506 articles were obtained (Supplementary Material 3), including 899 studies in Medline, 62 in CINAHL, 46 in Scopus, 11 DiTA, 311 in the Web of Science, 55 in the Cochrane Library, and 122 in ScienceDirect in addition to an American degree thesis obtained through a search of the grey literature. After removing duplicates, 901 articles and a thesis were obtained to be reviewed independently by the two authors. After titles and abstracts were read, we excluded 888 articles and 1 thesis because they did not respect the inclusion criteria previously defined. Thirteen articles remained after this last step, and their full texts were read. At this stage, only one article [20] was considered irrelevant because the index test consisted of a clinical test and an X-ray and therefore required complex instrumentation. In the end, 12 diagnostic accuracy studies (Fig. 1) deemed suitable for the purpose of this systematic review were evaluated using the QUADAS-2 scale.
Article and Clinical Characteristics
Five studies [13,14,21-23] (1,050 total patients) compared the elbow’s range of motion (ROM) as an index test versus X-ray as a reference standard, four studies [24-27] (654 patients with fracture, 2,024 total) compared the ROM in elbow extension as an index test to X-ray as a reference standard, and three studies [28-30] (1,411 total patients) compared clinical clusters that included the evaluation of ROM in elbow extension and tenderness points to X-ray as a reference standard. In four studies [14,23,24,27], the values of +LR and −LR were also present, while, in eight studies [14,21-25,27,29],VPP and VPN values were reported; in other cases, the data were extrapolated by the researchers (GDM and PC) using specific online calculators and added to the summary table of diagnostic accuracy values. Two studies [22,29] investigated this condition in children, and 10 studies investigated the same in children and in adults [13,14,21,24-26,28,30]. The total number of fractures was not calculated in relation to the global sample divided by subgroups of different index tests. In cases where the +LR and −LR were indirectly calculated from the data present in the studies, it was not possible to associate them with relative confidence intervals. No articles were found that investigated index tests that included resistive muscle tests or specific special tests. The main features of the studies are collected in Tables 1 and 2. In all studies, the tests were done by doctors or nurses. Most of the included studies did not provide training for learning how to perform the clinical tests, considering them easy to perform. Full results of the included studies are available in Supplementary Material 4. As can be seen from Table 3, all studies are characterized by a low risk of bias in the domains related to applicability; the study with the lowest risk of bias appears to be that by Amiri et al. [21].
DISCUSSION
Diagnostic Accuracy of the Studies
Tables 2 and 3 summarize the main data concerning the diagnostic accuracy of the selected studies. Its rapid analysis shows that, overall, the studies have high sensitivity values that, in several cases, were close to 100%, making these tests useful to exclude possible elbow fractures. In contrast, the specificity was much lower, with a very wide range of variability (24%–69.4%).
Comparing our results to those obtained by Joshi et al. [15], it was possible to find that, in both studies, the mobility tests obtained very similar values, although more studies than Joshi et al.’s review [15] review were added to our study and the reviews had different objectives. In particular, in the previous review, the average sensitivity was 0.84 (0.82–0.87) and the specificity was 0.57 (0.54–0.59) for the elbow extension test.
Overall, three main types of index tests could be identified. The first evaluates the elbow extension test, the second evaluates the ROM in different directions of the elbow, and the third involves a cluster formed by an extension test and palpation test. In each test, the reference standard used was the X-ray.
Elbow Extension Test
Four studies [24-27] analyze the diagnostic accuracy of the elbow extension test. Overall sensitivity showed values of >90% with a maximum value of 97.3% (84.6%–99.9%) in Docherty et al.’s study [25]. In contrast, in these four studies [24-27] specificity and +LR displayed much lower values (48.5%–69.4%), thus reducing the test’s ability to correctly classify subjects with a positive test.
Elbow ROM Test
Among the five studies [13,14,21,22] included in this subgroup, very high sensitivity values emerged, reaching 100% in the studies by Darracq et al. [13], Vinson et al. [14]. Conversely, specificity achieved high values only in the studies of Amiri et al. [21] and Darracq et al. [13] (88%–97%). In Lennon et al.’s study [23], diagnostic accuracy values emerged that strongly disagree with those of the rest of the studies, showing a high specificity and low sensitivity of the elbow ROM test. This is, however, due to the outcome being studied, which was the detection of the absence of fracture in this study but the presence of fracture in all others. Considering this aspect, the results are therefore in line with the rest of the studies.
Cluster of Elbow ROM and Tenderness Test
In the three studies [28-30] that conducted a mobility test associated with the palpation of some bone points of the elbow, the sensitivity was very high (>97% in all cases). Meanwhile, the specificity of contrast had very low values never exceeding 24%.
Adults and Children
In five studies [23-25,28,30] diagnostic accuracy values were calculated specifically in adults (aged >16 years). In all studies, a high sensitivity of the tests was found that reached a maximum value of 98.6%, while there was a wide range of specificity (24%–69.4%). In contrast, seven studies [2,23,24,27-30] analyzed tests in a specific population of subjects under the age of 14–16 years. In these studies, the test with a better sensitivity was the elbow extension test associated with the palpation of 5 specific points with values of 97.1%–100% [29]. The specificity was also very variable in this case, passing from 14% of the elbow extension and palpation cluster [29] to 64% of the elbow extension test [22].
Risk of Bias
The methodological quality of the studies analyzed was very good in relation to the applicability of the tests, while, in almost all studies, the presence of bias was assessed, in particular relating to the selection of the patient sample [13,14,22,26], or related to the timing of administration of the index test [24,28,29]. This may have partially influenced the obtained results. In children, it is possible to make a similar argument, with similar results, but the sample is not sufficiently numerous. All included studies were identified as cross-sectional, although three studies [24,28,29] conducted telephone follow-up to 7–10 days in patients with a negative index test and, only in the case of persistent symptoms, these same patients were recalled for X-ray. This means that these studies cannot be defined as real cross-section analyses, and it would be appropriate to consider the results based only on the first standard reference made. However, this has not been done as it was decided to fully analyze all the data from the 12 included studies so that a homogeneous analysis of the latter could be performed. The external applicability of these tests in an outpatient setting is very high: as we mentioned above, these tests are very simple to perform and easily reproducible, although they were performed in an emergency setting in the included studies.
Implications for Clinical Practice and Future Research
It was not possible to carry out a meta-analysis of the results obtained in this study as there were several studies with different and non-homogeneous characteristics, and only a descriptive analysis was performed. These tests seem to be useful to exclude the presence of fractures, especially in cases where they are negative, given the low number of false negatives found. As for specificity, in all three types of proposed index tests, discordant values were obtained, but the evaluation of the complete ROM, if positive, could lead us to think of a fracture following an important traumatic event. Among all movements of the elbow, the extension movement is that which, as values of diagnostic accuracy, is more reliable for assessing possible elbow fractures, compared to the flexion and prono-supination movements, but also with respect to the tenderness points.
Study Limitations
The limitations of this review are characterized by the relatively few studies and their heterogeneity, which, as in many revisions, are different for the type of sample, characteristics of administration of the index tests, and comparison with the respective reference tests. Furthermore, the lack of some data in the individual studies excluded the possibility of making definitive conclusions from all 12 selected studies.
In the analysis and interpretation of our results, we must also consider that radiographs are not a perfect reference standard. As indicated above, X-ray is not a sensitive and 100% specific exam for elbow fractures. In a minority of cases, but still a significant number given the detection of the condition, it may happen that occult fractures are detected only by computerized axial tomography or magnetic resonance. In some of these studies, non-fractures identified by the X-ray may not correspond to the actual clinical condition. This is an important limitation of all the studies analyzed, as the search for a gold standard for elbow fractures has not yet been sufficiently investigated.
Finally, an important aspect to consider is age used to define the pediatric subgroup. It was not uniform in the studies included but had a certain range of variability. The comparison between the data obtained in these studies may therefore not be as accurate.
CONCLUSIONS
Considering the results of the studies with the lowest number of biases, the elbow mobility tests appear to be useful, in case of a negative test, to rule out an elbow fracture. The specificity of all the index tests proposed at the moment does not allow us to draw useful conclusions. Further studies are needed to investigate more deeply the diagnostic accuracy of these clinical tests and to confirm the results of this review.
Notes
Author contributions
Conceptualization: GB, PC, PP. Data curation: GB, GDM. Investigation: GDM. Methodology: GB, PC, PP. Supervision: PP. Writing – original draft: GB, GDM. Writing – review & editing: PC.
Conflict of interest
None.
Funding
None.
Data availability
None.
Acknowledgments
None.
Supplementary materials
Supplementary materials can be found via https://doi.org/10.5397/cise.2022.00948.