Multivariate outlier detection applied to multiply imputed laboratory data

Kay I. Penny*, Ian T. Jolliffe

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

In clinical laboratory safety data, multivariate outlier detection methods may highlight a patient whose laboratory measurements do not follow the same pattern of relationships as the majority of patients, although their individual measurements are not found to be outlying when considered one at a time. Missing data problems are often dealt with by imputing a single value as an estimate of the missing value. The completed data set may then be analysed using traditional methods. A disadvantage of using single imputation is the underestimation of variability, with a corresponding distortion of power in hypothesis testing. Multiple imputation methods attempt to overcome this problem, and in this paper a study is described which considers the application of multivariate outlier detection methods to multiply imputed clinical laboratory safety data sets. Three different proportions of missing data are generated in laboratory data sets of dimensions 4, 7, 12 and 30, and a comparison of eight multiple imputation methods is carried out. Two outlier detection techniques, Mahalanobis distance and generalized principal component analysis, are applied to the multiply imputed data sets, and their performances are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.

Original languageEnglish
Pages (from-to)1879-1895
Number of pages17
JournalStatistics in Medicine
Volume18
Issue number14
Early online date15 Jul 1999
DOIs
Publication statusPublished - 30 Jul 1999

Fingerprint

Dive into the research topics of 'Multivariate outlier detection applied to multiply imputed laboratory data'. Together they form a unique fingerprint.

Cite this