In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents representing 10 European languages we tested for DIF in the nine translations of the EORTC QLQ-C30 emotional function scale when compared to the original English version. We tested for DIF using two different methods in parallel, a contingency table method and logistic regression. The DIF results obtained with the two methods were similar. We found indications of DIF in seven of the nine translations. At least two of the DIF findings seem to reflect linguistic problems in the translation. 'Imperfect' translations can affect conclusions drawn from cross-national comparisons. Given that translations can never be identical to the original we discuss how findings of DIF can be interpreted and discuss the difference between linguistic DIF and DIF caused by confounding, cross-cultural differences, or DIF in other items in the scale. We conclude that testing for DIF is a useful way to validate questionnaire translations.
- differential item functioning
- multi-item scales
- RESPONSE THEORY