Introduction
Being reliable instruments that measure progress by obtaining the perspective of patients, outcome measures are common in the field of orthopedic surgery [
1-
4]. The Disability of the Arm, Shoulder and Hand (DASH), American Shoulder and Elbow Surgeons Standard Shoulder Assessment score (ASES), and the Constant score outcome measures are widely used in the area of shoulder joint orthopedics [
5].
Since the highest possible score may not be attainable even by patients with asymptomatic shoulders, comparing patient outcome measure scores to the highest score possible is not the most reliable evaluation method [
1]. Furthermore, there is limited research that investigates how scores are affected by perception of overall health, as scores are typically compared to normative data, and patient factors such as hand dominance, sex, and perception of overall health are not accounted for [
6].
This study was undertaken to explore the relationship between commonly used shoulder patient outcome measures (DASH, ASES, and Constant score) by comparing best possible scores obtainable on these tests in an asymptomatic population, compared to overall perception of health as measured by the SF-36 outcome measure. The SF-36 outcome measure consists of eight domains: physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, and mental health. We hypothesized that the highest possible scores for all three shoulder measures is not attainable by an asymptomatic population, and that perception of overall health will affect the scores of shoulder outcome measures. Therefore, if our hypotheses are correct, we will have evidence to support that when interpreting the DASH, ASES and Constant score, a health care professional needs to evaluate the specific raw scores in relation to the self-perception of the patient’s overall health.
Methods
This study was approved by the United States Sports Academy Institutional Review Board. The study period was from January 2, 2017 to February 17, 2017. Totally, 580 people from the University of Minnesota Morris (n=520) and a Heartland Orthopedic Specialists (n=60) were invited to voluntarily participate in this investigation. The two venues gave the researchers access to an array of individuals aged 18 years and above, representing both sex. Similar to the research by Clarke et al. [
1] and Sallay and Reed [
7], volunteers who had asymptomatic shoulders and reported to have never experienced shoulder pain or injury, and never underwent surgery, imaging, or had any sort of shoulder pathology (bilaterally) within their lifetime, were included in this study. Subjects were at least 18 years of age, and were asked to voluntarily complete the survey.
Instrumentation
The online tool Qualtrics (Qualtrics Labs, Provo, UT, USA) was applied to administer the survey, comprising demographic questions (sex, age, identifying if the individual had any previous shoulder injury, imaging or surgical procedures performed on either shoulder), SF-36 health survey, and three outcome measures pertaining to the shoulder, viz., the DASH, ASES, and Constant score [
8-
14]. We added scores of the eight domains and averaged them to obtain a score out of 100 possible points for the SF-36 measure. The volunteers completed the shoulder outcome measures once for each shoulder. R.E.H. performed the objective Constant score assessment [
15].
All outcome measures for this investigation were subjective, except for the Constant score, which required an objective examination. The objective assessment consisted evaluation of the arm position, strength of abduction using a spring balance, range of motion in forward flexion and lateral elevation using a goniometer, external rotation, and internal rotation. Similar to the exclusions outlined by Yian et al. [
15], individuals were excluded if they were unable to attain 90 degrees of active forward flexion, 90 degrees of active abduction (lateral elevation), less than 20 degrees active external rotation, and have abduction strength less than 1.5 kg, during the objective assessment of the Constant score. Mechanical fixed-spring balances are widely used devices for measuring strength for administering the Constant score [
14,
16]. A fixed-spring balance made by Rapala (Minnetonka, MN, USA) was used in this investigation. This digital scale is equipped to handle up to 22.68 kg of tension. The researchers measured and deemed this scale accurate by weighing predetermined 2.27, 4.54, and 11.34 kg weights.
Since response rates of internet-based surveys are known to increase with follow-up memos, volunteers from each venue received an introductory email followed by two additional followup emails [
17]. All three emails explained the study and contained a link to access the online survey. Those who chose to be a part of the investigation completed the survey and subsequently contacted the researcher (via email, phone, or text) to schedule a time for assessment of their strength and range of motion tested, as per the Constant score protocol.
Statistical Analysis
According to Green [
18], the power for test of a multiple correlation is approximately 0.80 greater if N≥50+8m (m is the number of variables); therefore, we required to survey at least 106 (N≥50+8(7)) volunteers from the two venues combined. The significance level was set at 0.05, which indicates that there was a 5% chance that the results were accidental. The program R (Vienna, Austria) was utilized for analyzation. E.S., a biostatistician, analyzed the data.
We obtained a 19% response rate for survey appeals. A total of 111 volunteers (72 female, 39 male) completed both the DASH and ASES measures, ranging in age from 20–69 years. Of the 111 volunteers, 92 (56 female, 36 male) completed the questions and objective assessment for the Constant score. This group also ranged between 20 to 69 years. All patients who completed the surveys were included in the final analysis.
The statistical analysis was accomplished in two stages. In the first stage, the relationships between the three outcome measures for the left and right shoulders, and the average overall SF-36 score controlling for the sex and hand dominance, were analyzed. The average overall SF-36 score was determined by finding the average of the eight domains to get a score out of 100. Correlation analysis and linear models were applied for the same.
In the second stage, we looked individually at the eight domains of the SF-36 measures in detail. The linear models include the eight health domains, sex, and hand dominance as the variables explaining the outcome measures.
Mann–Whitney test was applied to evaluate the differences between each of the outcome variables for sex, hand dominance, and age. Since the ASES group for the right shoulder where the left hand was dominant scored 100% for that measure, two separate statistical procedures were used. First, the 95% non-parametric confidence interval (CI) was found by using the Wilcoxon rank procedure. Second, the chi-squared test of independence was used by creating categories “did not achieve most possible points” and “achieved most possible points” for both the dominant hand and the non-dominant hand groups.
Results
With the exception of the right shoulder for the ASES measure, the highest possible score was not attained for the three shoulder outcome measures (
Table 1). For the DASH score, “0” is considered the highest score, whereas “100” is considered the highest score for both the ASES and Constant score. When analyzing hand dominance for each of the three shoulder measures, the only statistical difference was noted for the ASES measure, where the mean score of the right shoulder was higher for the left hand dominant side (
x=100.00 vs. 95.02, 95% CI=93.17–96.87,
p<0.001). Age had no significant effect on any of the shoulder scores (DASH
p=0.82, ASES
p=0.91, Constant score p=0.64), nor was there a statistical difference between sex (
p=0.5846) for the age variable. The most frequent age group was 35 to 39 years for both sex.
Evaluating the perception of overall health, the mean score of the SF-36 health survey was 80.79 (standard deviation=11.71, range=54.19). Based on this measure, perception of overall health did not differ between males and females (p=0.19). There was a significantly negative relationship between the SF-36 score and the DASH score of the right shoulder (Pearson correlation=-0.23, p=0.01) as well as the left shoulder (Pearson correlation=-0.22, p=0.02). As the score of the SF-36 rose (indicating better health), the scores of the DASH measure decreased, suggestive of better shoulder function. For the ASES measure, there was no significant correlation with the SF-36 scores for both right (Pearson correlation=-0.03, p=0.978) and left shoulders (Pearson correlation=0.005, p=0.9551). Likewise, for the Constant score, no significant correlation was observed with SF-36 scores for both right (Pearson correlation=-0.05, p=0.623) and left shoulders (Pearson correlation=0.036, p=0.7263).
Prediction models were built to predict scores of the three outcome measures. The heat maps presented in
Fig. 1-
5 have been constructed using Pearson correlation for the three outcome measures and SF-36, controlling for the sex and hand dominance. For all participants, the three outcome measures weakly correlate with each other (
Fig. 1). The only relationship between outcome measures and SF-36 appears for scores of the DASH measure. Correlation structure was observed to change when we analyzed the heat maps for females (
Fig. 2) and males (
Fig. 3). DASH scores are negatively correlated with the SF-36 scores for males, but do not correlate for females. Similarly, DASH scores and the SF-36 have stronger negative correlations for right hand dominant participants (
Fig. 4) compared to the left hand dominant participants (
Fig. 5).
The following linear model fitted included the three outcome measures as the response variable, and SF-36, sex, and hand dominancy as the explanatory variables. The model also included an interaction term between sex and hand dominance.
The Disability of the Arm, Shoulder and Hand
The predicted value for the DASH score for right hand dominant females is 10.669 for the right shoulder and 14.635 for the left shoulder when the SF-36 score is zero, which serves as the baseline. For every SF-36 point increase, the DASH score decreases by -0.129 points for the right shoulder and -0.173 points for the left shoulder, when sex and hand dominance are constant. Considering sex, the scores of males are predicted to be 7.933 points higher than the scores of females for the right shoulder, and 11.287 points higher for the left shoulder, keeping SF-36 and hand dominance constant. Considering the hand dominance, the scores of left-handed individuals are predicted to be 0.857 points higher for the right shoulder and 1.439 points higher for the left shoulder than the scores of right hand dominant individuals, keeping SF-36 and sex constant (
Table 2).
A significant interaction is observed between sex and hand dominance for both shoulders. Considering the DASH score of the right shoulder, scores obtained from females are lower than males, with no major difference observed between scores for hand dominance. Males had lower scores than females and, when taking hand dominance into account, the difference between scores was greater (
Fig. 6). A similar pattern was determined for the left shoulder (
Fig. 7). Comparing the right dominant and left hand dominant individuals, variation in the DASH score is significantly higher for males compared to females. For the right shoulder, it is predicted that the scores of left-handed males will increase by 15.852 points, as compared to the baseline. Likewise, for the left shoulder, it is predicted that the scores of left-handed males will increase by 23.505 points compared to the baseline (
Table 2).
As presented in
Table 3, general health is the only significant domain that affects the DASH score of the right shoulder when all eight domains of SF-36 are considered. Keeping all other variables fixed in the model, it is predicted that DASH score decreases by -0.108 points as the general health domain increases by one point (
p=0.00184). In addition, limitations to emotional health domain trends towards significance, with a p-value of 0.064. For females, the
p-value (0.0559) is lower compared to males (0.16344). For the left shoulder, general health and role limitation due to emotional health are observed to affect scores. Keeping all other variables fixed in the model, it is predicted that score of the left shoulder decreases by -0.144 points as the general health domain increases by one point (
p=0.003) and the score decreases by 0.059 points as the role limitation due to emotional health domain increases by one point (
p=0.023).
American Shoulder and Elbow Surgeons Standard Shoulder Assessment Score
With SF-36 score zero, the predicted value for the baseline ASES for right hand dominant females is 95.736 for the right shoulder and 96.311 for the left shoulder. Accounting for sex, the scores of males are predicted to be 8.068 points higher than the scores of females for the right shoulder, with fixed SF-36 and hand dominance. When considering hand dominance and keeping SF-36 and sex fixed, the scores of left-handed individuals are predicted to be 8.191 points higher for the right shoulder (
Table 2).
Furthermore, there is a trend toward significant interaction between sex and hand dominance for the right shoulder. It is predicted that scores of left-handed males will decrease by -8.538 points, as compared to the baseline. As presented in
Fig. 8, no difference is observed between the scores of left hand dominant sexs; however, for right-hand dominant individuals, the male scores are better than scores obtained for females. For the left shoulder, right dominant hand females have higher scores as compared to their male counterparts; however, the scores are observed to decrease for left hand dominant females (
Fig. 9).
Considering all eight domains of SF-36, none are significantly different for the right shoulder. However, mental health domain trends towards significance with a
p-value of 0.069 for the left shoulder; the
p-value for females (0.105) is lower when compared to the
p-value of males (0.383) (
Table 3).
Constant Score
With SF-36 score being zero, the predicted value for the baseline Constant score for right hand dominant females is 90.67 for the right shoulder and 87.34 for the left shoulder. Accounting for sex, the scores of males are predicted to be -4.63 points lower than the scores of females for the right shoulder, keeping the SF-36 and hand dominance fixed (
Table 2). As noted in
Fig. 10, females have lower scores when comparing both left and right hand dominance. However, for the left shoulder, the right-hand dominant females have better scores than males of the same hand dominance, but worse scores when comparing left hand dominance between sex (
Fig. 11).
Taking the eight domains of SF-36 into consideration, none are significantly different for the right shoulder. However, the mental health domain trends toward significance with a p-value of 0.069 for the left shoulder; the p-value for females (0.105) is lower when compared to the p-value of males (0.383).
As presented in
Table 3, taking the eight domains of SF-36 into consideration, the mental health domain is trending towards significance (
p=0.072) for the right shoulder. The p-values for females (0.126) are lower as compared to males (0.650). For left shoulder, the physical functioning domain is the only significant domain that affects the Constant left score (
p<0.001). Keeping all other variables fixed in the model, it is predicted that Constant left score goes up by 13.256 points as the physical functioning domain increases one point (
p=0.00184).
Discussion
Highest possible scores were not attainable for either the DASH, ASES, or Constant score by analyzing the data gained from asymptomatic volunteers with no history of shoulder complaints. Hand dominance (with the exception of the ASES where, for the right shoulder, the mean score was higher for the left-hand dominant side) or shoulder side does not affect the mean scores. The perception of overall health affects the scores of the DASH outcome measure. For both shoulders, increasing SF-36 measure (indicating better health) tends to normalize the shoulder scores. Furthermore, our prediction models suggest that perception of overall health affects scores of DASH, sex affects scores of all three outcome measures, and hand dominance affects scores of both DASH and ASES measures.
The average patient score for all three shoulder measures in our investigation is lesser than the highest score possible, which is similar to findings of other researchers [
1,
7,
15]. Clarke et al. [
1] determined baseline scores for the DASH measure by administering surveys to 192 people with asymptomatic shoulders. For the DASH measure, the sample scored was 1.8% lesser than the highest score possible. Clarke et al. [
1] also determined baseline scores for the ASES measure, and reported their sample score to be 1.1% less than the highest score possible. Likewise, Sallay and Reed [
7] determined the ASES baseline score to be less than the highest score possible by 7.8% after they analyzed a modified version of the questionnaire that was administered to 343 patients being treated for non-shoulder related issues at an outpatient orthopedic center. For the Constant score, Katolik et al. [
19] reported that baseline scores range between 88 and 95 depending on the age group; also, men scored higher than woman, and they also found age-related differences. Yian et al. [
15] analyzed two separate groups (2,900 clinic patients and 115 healthy volunteers) and reported the mean Constant scores to be 92 and 87, respectively. These studies only evaluated the dominant hand, and most of them only assessed one or two measures. From our investigation and those of Clarke et al. [
1], Sallay and Reed [
7], Yian et al. [
15], and Katolik et al. [
19], we confirmed that even in asymptomatic patients, it is unusual to attain optimum or the highest possible scores for the DASH, ASES, or Constant outcome measures.
Constant [
20] reported mean scores for the left and right shoulders, and for dominant and non-dominant shoulders, and found that the mean scores did not differ significantly. Likewise, we compared the mean scores of the right and left shoulders for all three measures. Similar to Constant [
20], no statistically significant differences were observed. Constant [
20] recommended that comparison with the opposite side should be avoided, as several patients in a given population normally have a problem with the contralateral shoulder. During our literature review, we were unable to locate a similar investigation that compared mean scores of the opposite side with hand dominance, for either the DASH or ASES measures.
For the ASES measure, a statistical difference was found when comparing the mean scores of the right shoulder, when mean scores were higher for the left hand dominant side; however, patients may be unable to detect any difference. The minimal clinical important difference (MCID), the smallest difference in score that patients perceive as important, was developed to determine if statistically significant data is also clinically significant [
21]. It is reported that a change of 6.4 points for ASES is the MCID detectable by patients [
22]. Since a difference of 5.0 was attained, the data was not clinically significant. In the clinical setting, hand dominance can be ignored when interpreting the results of shoulder outcome measures.
It is generally expected that patients will not achieve the highest possible score on outcome measures following shoulder intervention. Hence, it may be more important to compare postintervention outcome measures to investigations such as ours, or as reported by others with no intervention such as Clarke et al. [
1], Sallay and Reed [
7], and Yian et al. [
15] Achieving the highest score possible may not be a reasonable standard by patients and providers.
The DASH measure score is affected by the patient perception of overall health. As perception of overall health improves, the mean score of this measure similarly increases. In their study with chronic pain, depression, and quality of life, Elliott et al. [
23] reported that all chronic pain patients have very low SF-36 scores, and concluded that the SF-36 was able to detect major depression; they demonstrated a dose-effect relationship between depression type (severity) and health-related quality of life in chronic pain patients. Likewise, Bergman et al. [
24] reported that changes in SF-36 scores coincide with improvement or deterioration of the pain status, when they evaluated musculoskeletal pain in the general population. Utilizing the SF-36 outcome measure, Dawson et al. [
25] examined the rates of hip and knee pain in the elderly, and compared combinations of symptoms with overall health status. They reported that SF-36 scores worsened as the number of symptomatic hip and knee joints increase. We were unable to find any investigation that accounts for perceptions of overall health as measured by the SF-36, for any of the three shoulder measures; therefore, we believe that our investigation is unique, and contributes new information to the existing literature.
Recognizing the importance of individualized healthcare, we developed prediction models for all three shoulder measures that allow healthcare providers to estimate shoulder outcome scores, accounting for the dominant hand, sex, and perception of overall health of the patient. When considering sex, our model shows that males are predicted to have worse scores on the DASH and Constant scores than females, but better ASES. Hand dominance is also a factor in the outcome measure scores. It is important to realize that there is no single straightforward approach to interpret the scores of outcome measures. Our findings demonstrate the importance of gathering additional demographic information when interpreting scores. This additional information will provide the healthcare professional with a more accurate picture of the progress or the health status of their patients.
Regardless, we recommend that, when utilizing shoulder specific outcome measures, a health specific measure be used in conjunction with the shoulder measures. This is because every patient is different and unique, and knowing these factors may assist the healthcare provider in offering the patient a treatment plan that would give them the best opportunity for success. Similar to SF-36, our literature search did not find any prediction models for these outcome measures. Therefore, the current study sets the groundwork for developing methods to predict patient scores on shoulder outcome measures in an effort to provide individualized healthcare.
Understanding and interpreting the scores of outcome measures is essential for proper reporting to the patient, and for gauging their health status. For the DASH measure, with increasing SF-36 measure indicating better health, we observed improved shoulder scores. This suggests that outcome measures may be impacted by the patients’ health and their perception of health. We also found that demographic information influences outcome measure scores. Therefore, we suggest that healthcare professionals need to consider the demographic information and health status of the patients, when interpreting the scores of shoulder outcome measures. This requires administering a measure of overall health with joint specific outcome measures. By doing this, health care professionals will gain a clearer picture of the overall health and a more meaningful understanding of postsurgical outcome measure scores of the patients.
The strength of this investigation is that it considers the health status by administering the SF-36 health survey. However, there are limitations to this study. While we used one observer for the objective score of the Constant score, Yian et al. [
15] compared resident physician observers to a single experienced observer and found no difference between the two groups. Therefore, we used one observer for the objective portion of the Constant Score. Other researchers, in the area of normative values of outcome measures, have used volunteers without shoulder pain, injury, previous surgery and pathology (bilateral). Therefore, we compared our mean scores to comparable investigations [
1,
7,
15,
19,
20]. Another limitation is that there may be a sex bias, since the number of male and female subjects were not equal within our sample. The majority being female, the values obtained may favor that sex.