In clinical laboratory safety data, multivariate outlier detection methods may highlight a patient whose laboratory measurements do not follow the same pattern of relationships as the majority of patients, although their individual measurements are not found to be outlying when considered one at a time. Missing data problems are often dealt with by imputing a single value as an estimate of the missing value. The completed data set may then be analysed using traditional methods. A disadvantage of using single imputation is the underestimation of variability, with a corresponding distortion of power in hypothesis testing. Multiple imputation methods attempt to overcome this problem, and in this paper a study is described which considers the application of multivariate outlier detection methods to multiply imputed clinical laboratory safety data sets. Three different proportions of missing data are generated in laboratory data sets of dimensions 4, 7, 12 and 30, and a comparison of eight multiple imputation methods is carried out. Two outlier detection techniques, Mahalanobis distance and generalized principal component analysis, are applied to the multiply imputed data sets, and their performances are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.
|Number of pages||17|
|Journal||Statistics in Medicine|
|Early online date||15 Jul 1999|
|Publication status||Published - 30 Jul 1999|