Validation of the Danish version of the musculoskeletal tumour society score questionnaire

2019-02-20CasperKlosterPingelSaebyeJohnnyKellerThomasBaadHansen

World Journal of Orthopedics 2019年1期

Casper Kloster Pingel Saebye, Johnny Keller, Thomas Baad-Hansen

Abstract BACKGROUND The musculoskeletal tumour society score (MSTS) is a well-known questionnaire for measuring functional outcome in patients with neoplasms in the extremities.Standardized guidelines for cross-cultural translation and validation ensure the equivalence of content between the original and translated versions. The translation and validation provide the possibility to compare different sarcoma populations on an international level. This study is based on the hypothesis that the Danish MSTS questionnaire is a valid tool for measuring the end result after surgery for neoplasms in the extremities.AIM To validate the Danish version of the upper and lower extremity version of the MSTS.METHODS The translation of the MSTS was conducted in accordance with international guidelines. Patients operated for sarcomas and aggressive benign tumors were invited to participate in the study. The psychometric properties of the Danish version of the MSTS were tested in terms of validity and reliability and for the risk of floor or ceiling effect. Spearman’s rank coefficient was used to test the validity by comparing with the Toronto Extremity Salvage Score (TESS). The Intraclass Correlation Coefficient (ICC) was used to evaluate inter-rater reliability. Cronbach’s alpha was used to test for internal consistency. Spearman’s rank coefficient was used to compare the MSTS lower extremity version with the objective test, Timed Up and Go (TUG).RESULTS The upper extremity version demonstrated an ICC of 0.95 in the inter-rater reliability test. The lower extremity version had an ICC of 0.88 in the inter-rater reliability test, respectively. Both MSTS versions showed a ceiling effect. The validity of the MSTS was measured by Spearman’s rank correlation coefficient by comparing the MSTS with the TESS and found it to be of 0.80 (P ＜ 0.01) and 0.83(P ＜ 0.01) for the upper extremity and lower extremity version, respectively. A Spearman’s rank correlation coefficient of - 0.26 (P ＜ 0.01) was found between the TUG and the MSTS questionnaire. A Spearman’s rank correlation coefficient of -0.38 (P ＜ 0.01) was found between the TUG and the lower extremity version of the TESS questionnaire.CONCLUSION The Danish version of the MSTS questionnaires were found to have good reliability and validity, however a substantial ceiling effect was identified.

Key words: Sarcoma; Patient outcome assessment; Clinical oncology; Surgical oncology;Patient satisfaction

INTRODUCTION

The Musculoskeletal Tumour Society Score (MSTS) questionnaire was developed in 1985 and revised in 1993 as a physician-completed questionnaire to measure functional outcome in patients with neoplasms[1]. The MSTS has been widely used in sarcoma research[2-6]. However, the English version of the MSTS has never been properly validated[1]. The lower extremity version of the MSTS has been translated and validated into Brazilian Portuguese, Chinese and Japanese[7-9]. To our knowledge,the MSTS has never been properly translated and validated for Danish-speaking patients.

Guillemin et al[10]were some of the first to construct a standardized guideline for cross-cultural translation. The cross-cultural translation is intended to ensure the equivalency of content between the original and translated version. Equivalency is achieved by ensuring not only correct linguistic translation but also cultural adaption.Others have since created recommendations regarding ways of assessing the psychometric properties of such an instrument[11,12].

The Timed Up and Go (TUG) test was introduced in 1991 as measure of dynamic balance and basic mobility skills needed for daily living[13]. It has been shown to have good validity and reliability in lower extremity patients who have undergone unilateral amputation[13]. The TUG has not been properly validated for use in sarcoma patients who have undergone limb-sparing surgery, although Marchese et al[14]have validated the TUG as part of a larger functional outcome assessment method in sarcoma patients.

Only few studies have investigated the correlation between objective measurements and questionnaires, such as the MSTS[14,15]. However, Marchese et al[14]found a fair to moderate correlation between the TUG and the MSTS and TESS questionnaires.

In order to compare Danish sarcoma patients’ functional outcomes internationally,the aims of this study have been: (1) to validate the psychometric properties of the Danish translation of the lower and upper extremity version of the MSTS questionnaire; and (2) to investigate the correlation between functional outcomes as measured by questionnaires, such as the MSTS and the Toronto Extremity Salvage Score (TESS), and the objective measurement, TUG, in patients with lower extremity tumors.

MATERIALS AND METHODS

Study design

The translation of the MSTS questionnaire into Danish was conducted at the Department of Orthopedic Surgery at Aarhus University Hospital between May and August 2015. The validation of the Danish translation was carried out among patients operated for sarcoma or aggressive benign tumors who attended the outpatient clinic at Aarhus University Hospital (Aarhus, Denmark) between August 2015 and June 2016. The study was reported to and approved by the Danish Data Protection Agency(file No. 1-16-02-650-15). Informed consent was obtained from all patients participating in this study. The study was preapproved in accordance with the national ethical guidelines and in accordance with the Helsinki Declaration. The translation method used was based on published international guidelines for the process of cross-cultural translation of an instrument[10,16]. The cross-cultural translation and validation consists of several stages.

Translation

Stage I: Forward-translation: Two independent translators translated the upper and lower extremity version of the MSTS questionnaire (including the instructions to the user) from the original English version. The two independent translators were fluent in English and Danish but had Danish as their native language (both held diplomas in English and one was also a linguist). The two translators had different backgrounds in order to achieve the best possible translation. The first translator was a physician with clinical experience and was therefore considered an “informed” translator. The second translator had no clinical experience or relation to health care and was therefore considered a “naive” translator.

Stage II: Synthesis of a combined translation:The two translations were compared,and any discrepancies were discussed and resolved by the two forward-translators. A combined translation was finally made from the original English version and the two independent translations.

Stage III: Backward-translation: Two new independent translators conducted a backward-translation based upon the combined translation. They were blinded for the initial two forward-translations and the original English version. Both backwardtranslators were fluent in English (both held diplomas in English) and had Danish as their native language. The first backward-translator was a highly experienced researcher in health care, however with no prior knowledge to the concepts of the MSTS questionnaire. The second translator had no prior clinical experience or relations to health care.

Stage IV: Committee: The authors of this paper reviewed all the translated versions and components of the questionnaire, and the discrepancies were discussed following consensus concerning the final wording and formatting of the Danish version of the MSTS questionnaire.

The validation process

The validation was designed as a cross-sectional design, requiring physicians to complete the MSTS questionnaire and patients to complete the TESS. In case patients had been operated in the lower extremities, they also completed the TUG test.

Study population

All patients, age 18 or above, who had undergone surgical treatment for sarcomas or aggressive benign tumors in the extremities, were consecutively asked to participate in the study when attending the outpatient clinic. All patients were required to read and speak Danish to be able to participate. Patients were excluded if they had competing diseases affecting their physical function. A total of 240 out of 267 were included in the study.

Measurements

The MSTS is based on factors related to the patient as a whole and of those specific to the upper or lower extremity. It consists of six items of which the first three are identical in both MSTS versions: Pain, daily function and emotional acceptance. The upper extremity version also encompasses items measuring hand positioning,dexterity and lifting ability. As for the lower extremity version, this consists of items measuring the use of aids, walking ability and gait[1]. Each of the aforementioned items is assigned a value of 0 to 5 points, and the final score is calculated as a percentage of the maximum obtainable score. The original English version of the MSTS was never tested for psychometric properties. However, the lower extremity version was translated and validated into Brazilian Portuguese, Chinese and Japanese[7-9].

The TESS assesses functional outcome in musculoskeletal tumor patients aged 12-85 years[17]. The TESS consists of upper and lower extremity versions which have 29 items and 30 items, respectively. The final TESS score ranges between 0 and 100. The TUG test measures the time needed to stand up from a chair, walk 3 meters, turn around, walk back and sit back down in the chair[13]. A stopwatch was used to document the time used.

Analysis of the data

Data was analyzed in Stata, version 12.1. Descriptive statistics were used for the patients’ clinical demographic. All variables were examined to ascertain data distribution.

The psychometric properties were evaluated by assessing different domains, such as reliability and validity, as well as the estimate for possible floor and ceiling effects.Furthermore, the patients were stratified into groups according to tumor types.

The domain of reliability could be further divided into subdomains such as internal consistency, reliability and measurement error. The internal consistency was measured by Cronbach’s α. A Cronbach’s α between 0.70 and 0.95 was considered good[18]. The inter-rater test was conducted by having two different physicians independently complete the MSTS questionnaire in the outpatient clinic. The interrater reliability was measured by intraclass correlations coefficient (ICC). The measurement error of the MSTS questionnaire was assessed by Bland-Altman plots in the inter-rater test[19].

In this study, the construct validity was assessed by comparing the MSTS score with the TESS score[18]. This was evaluated by either the Pearson’s r or the Spearman’s rank correlation coefficient, depending on the distribution of data. Floor and ceiling effects were considered present if ＞ 15% of the patients received the lowest or highest possible score, respectively[18].

The correlation between functional outcome measured by questionnaires (MSTS and TESS) and an objective measurement (TUG) was found by calculating correlation coefficients between MSTS and TUG as well as TESS and TUG by using either Pearson’s r or Spearman’s rank correlation coefficient, depending on the distribution of data.

RESULTS

Translation

The results showed an overall high consistency between the two forward-translations and the two back-translations when compared with the original English version. Only the item concerning emotional acceptance was found to be slightly different due to the differences in how Danish people express their feelings concerning their health and the treatment. In the original English version words such as “enthusiastic” and“like it” are used for describing the feelings of the surgical treatment, however a Danish patient might find this wording culturally strange for describing the feelings of a cancer diagnosis and the treatment of this. The Danish version does however take this into account, hence the emotional acceptance item was still found appropriate.

Validation

The demographic data concerning the patients is listed in Table 1. The participation rate was 89.9%. Table 2 describes the reasons for the 27 patients not participating in the study. The median MSTS scores for upper and lower extremity versions were 93(Interquartile range (IQR): 80-100) and 87 (IQR: 73-97), respectively. The median TESS scores for upper and lower extremity versions were 98 (IQR: 83-100) and 93 (IQR: 81-98), respectively. The median TUG time (in seconds) was 6.4 (IQR: 5.4-8.0).

The test for internal consistency resulted in a Cronbach’s α of 0.85 for the upper extremity version and 0.79 for the lower extremity version. The inter-rater reliability was also measured by the ICC and was found to be 0.95 (95%CI: 0.92-0.97) for the upper extremity version and 0.88 (95%CI: 0.84-0.91) for the lower extremity version.Figures 1 and 2 present the limits of agreement in a Bland-Altman plot for the upper extremity versions and the lower extremity versions, respectively.

The analysis for construct validity found a Spearman’s rank correlation coefficient of 0.80 (P ＜ 0.01) and 0.83 (P ＜ 0.01) for the upper and lower extremity versions,respectively, between the MSTS and the TESS.

Table 3 presents the floor and ceiling effects found in the MSTS questionnaire. A Spearman’s rank correlation coefficient of - 0.26 (P ＜ 0.01) was found between theTUG and the lower extremity version of the MSTS questionnaire. Figure 3 illustrates the correlation between the TUG and the MSTS. A Spearman’s rank correlation coefficient of - 0.38 (P ＜ 0.01) was found between the TUG and the lower extremity version of the TESS questionnaire. Figure 4 illustrates the correlation between the TUG and the TESS.

Table 1 Baseline characteristics of the patients

DISCUSSION

This Danish translation of the MSTS questionnaire was found to have good internal consistency, reliability and construct validity. However, the MSTS does have limitations as shown by the identification of a ceiling effect and possible measurement error between raters. In addition, poor correlations were found between the MSTS/TESS and the TUG. When using an existing measurement, it is important that it has undergone a proper cross-cultural translation in order to ensure that it measures the same concept as the original measurement[10,16]. We have used well-known standardized guidelines to translate and validate the MSTS questionnaire into Danish[10,16,18]. The lower extremity version of the MSTS questionnaire has also been translated and validated into Brazilian Portuguese, Chinese and Japanese according to these guidelines[7-9]. To our knowledge, it is the first time the upper extremity version of the MSTS questionnaire has been translated into a foreign language following a standardized guideline. In the Japanese upper extremity version of the MSTS questionnaire, Wada et al[20]tested the construct validity, but it was not mentioned whether that version had undergone systematic cross-cultural translation. Lee et al[21]also reported the validity and reliability of both versions of the Korean version of the MSTS questionnaire, but also without reporting if the translation into Korean had been done according to the standardized guidelines.

The original validation of the English version of the MSTS unfortunately did not report a Cronbach’s α[1]. The good internal consistency found in this study is however comparable to those found by Rebolledo et al[7], Xu et al[9]and Iwata et al[8]. The interrater reliability also showed excellent results for both versions, and is in accordance with those found by Rebolledo et al[7]and Xu et al[9]. The original validation of the MSTS questionnaire also reported good inter-observer reliability, although no correlation coefficient was reported[1]. Figures 1 and 2 show low mean bias on all plots, however the limits of agreement are wide, which indicates a possible high measurement error. No previous studies have tested the measurement error in the MSTS[1,7,8,20,21]. The test for measurement error is an important part of the validation process, since only a change in the MSTS score larger than the measurement error can be considered a possible ‘real’ change in the functional outcome[22].

The construct validity of the MSTS has been determined as good in this study by comparing the MSTS with the TESS. This can be compared with similar results found by Rebolledo et al[7], Xu et al[9]and Iwata et al[8]. Wada et al[20]found a good correlation between the upper extremity version of the MSTS and the disability of the arm,shoulder and hand questionnaire.

Table 2 Reasons for exclusion from the study

A general ceiling effect was found in both the upper and lower extremity versions of the MSTS. However, when stratified there was no ceiling effect in patients with lower extremity bone sarcoma (8.9%, n = 45) or aggressive benign tumors (11.1%, n =18) (Table 3). These results are similar to those reported by Rebolledo et al[7](7.4%, n =67) in the lower extremity version of the MSTS, while the finding of a substantial ceiling effect in the pooled data is consistent with the results found by Iwata et al[8](23%, n = 100) and Wada et al[20]. A study by Tanaka et al[15]with the aim of predicting the knee extension strength and post-operative function has also shown a noticeable ceiling effect in the MSTS questionnaire (22.2%, n = 18). These results question the role of the MSTS in evaluating function in all musculoskeletal tumor patients, as a ceiling effect results in difficulties distinguishing between patients with superior function.Against this backdrop, it is important to consider the future role of the MSTS. A possibility could be to further develop this questionnaire to make it more appropriate for measuring physical function, or perhaps abandon this instrument entirely and instead develop a new and more precise one.

The current mainstay treatment of musculoskeletal tumors in the extremities directly influences the musculoskeletal system which accentuates the importance of an instrument that measures functional outcome precisely. We found a poor correlation between the TUG and the lower extremity version of the MSTS and the TESS. Marchese et al[14,23,24]also found a generally poor correlation between the TUG and the MSTS/TESS in three studies, while Tanaka et al[15]found a moderate to good correlation between the MSTS/TESS and the extension strength of the knee. This highlights the importance of choosing the correct instrument for measuring the desired concept of function. The purpose of the TUG is to measure the balance and mobility skills needed for daily living[13], while the purpose of the TESS is to measure the patients’ perception of function[17]. In this way, two various subconstructs of the concept of function are measured. Although both can be of an importance, in exploring a hypothesis they may differ in significance.

A great strength of this study is the number of participating patients. This study included 78 patients with upper extremity tumors and 162 patients with lower extremity tumors. Previous guidelines concerning the validation of instruments have set a minimum of 100 patients as an excellent sample size, while 50 to 99 patients constitutes a good sample size[25].

This study also has a main limitation, i.e., the presence of the possibility of selection bias, as only the patients attending the outpatient clinic were asked to participate in the study. Patients with progressive disease and patients who were not satisfied with their treatment were less likely to attend the outpatient clinic.

In conclusion, the Danish versions of the upper and lower extremity MSTS questionnaires were found to have good reliability and validity. The Danish versions are comparable to the other translated MSTS questionnaires. It is however of concern that a ceiling effect was found in both versions. When using the MSTS questionnaire,it is important to take into account which concept of function is intended to be measured.

Table 3 Floor and ceiling effects of the musculoskeletal tumour society score questionnaire

Figure 1 A Bland-Altman plot for the upper extremity version between raters.

Figure 2 A Bland-Altman plot for the lower extremity version between raters.

Figure 3 Correlation between musculoskeletal tumour society score and Timed Up and Go.

Figure 4 Correlation between Toronto Extremity Salvage Score and Timed Up and Go.

ARTICLE HIGHLIGHTS

Research background

The musculoskeletal tumour society score (MSTS) questionnaire is a physician/patientcompleted questionnaire designed to assess functional outcome for patients with sarcomas in the extremities. The MSTS questionnaire was originally developed in English. Over the past decades there has been increased focus on the aptness of questionnaires to measure correctly. This also includes the aptness of questionnaires after being translated from one language to another.

Research motivation

To ensure that the Danish version of the MSTS questionnaire measures the same aspects of functional outcome in sarcoma patients as the English version, it is important to validate the measurement properties of the Danish version of the MSTS questionnaire and compare it to other language versions of the questionnaire. Furthermore, cultural differences need to be considered during the translation process, as this is a part of ensuring the original measurement properties. This rigorous process provides the possibility to compare results from national studies with other international studies.

Research objectives

The objectives of this study were: (1) to validate the Danish version of the MSTS questionnaire;and (2) to investigate the correlation between functional outcomes as measured by questionnaires, such as the MSTS, and the objective measurement, Timed Up and Go (TUG).

Research methods

The translation of the MSTS was conducted in accordance with international guidelines. Patients,age 18 or above, operated for sarcomas and aggressive benign tumors were consecutively invited to participate in the study. The psychometric properties of the Danish version of the MSTS were tested in terms of validity and reliability and for the risk of floor or ceiling effects. Spearman’s rank coefficient was used to compare the MSTS lower extremity version with the objective test,TUG.

Research results

The upper extremity version of the MSTS questionnaire demonstrated an excellent intra- and inter-rater reliability. The lower extremity version of the MSTS questionnaire showed an excellent intra- and inter-rater reliability. A ceiling effect, however, was found in both versions.Both versions of MSTS questionnaire were shown to have good validity. The MSTS questionnaire showed a possible presence of a measurement error. A poor correlation was found between the objective measurement, TUG, and the functional outcome measured by questionnaires.

Research conclusions

The Danish version of the MSTS questionnaire was found to have good reliability and validity,however a substantial ceiling effect as well as the possibility of measurement error were identified. The Danish version of the MSTS questionnaire can be used to measure functional outcome in sarcoma patients and to compare these results with other international studies.

Research perspectives

The measurement errors and ceiling effects are concerns which are not to be overlooked. It is highly recommendable to further investigate these issues and the measurement properties of the MSTS questionnaires, such as its aptness in detecting significant clinical changes in the functional outcome.

World Journal of Orthopedics

2019年1期