Regression methods for high dimensional multicollinear data

L S Aucott, P H Garthwaite, J Currall

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.

Original languageEnglish
Pages (from-to)1021-1037
Number of pages17
JournalCommunications in Statistics - Simulation and Computation
Volume29
Publication statusPublished - 2000

Keywords

  • biased regression
  • data reduction
  • high-dimensional data
  • latent root regression
  • near infrared spectra
  • partial feast squares
  • principal components regression
  • stepwise regression
  • PARTIAL LEAST-SQUARES

Cite this

Regression methods for high dimensional multicollinear data. / Aucott, L S ; Garthwaite, P H ; Currall, J .

In: Communications in Statistics - Simulation and Computation, Vol. 29, 2000, p. 1021-1037.

Research output: Contribution to journalArticle

@article{60c8f92c3e0a43a3b915a50206a4cbd6,
title = "Regression methods for high dimensional multicollinear data",
abstract = "To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.",
keywords = "biased regression, data reduction, high-dimensional data, latent root regression, near infrared spectra, partial feast squares, principal components regression, stepwise regression, PARTIAL LEAST-SQUARES",
author = "Aucott, {L S} and Garthwaite, {P H} and J Currall",
year = "2000",
language = "English",
volume = "29",
pages = "1021--1037",
journal = "Communications in Statistics - Simulation and Computation",
issn = "0361-0918",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Regression methods for high dimensional multicollinear data

AU - Aucott, L S

AU - Garthwaite, P H

AU - Currall, J

PY - 2000

Y1 - 2000

N2 - To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.

AB - To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.

KW - biased regression

KW - data reduction

KW - high-dimensional data

KW - latent root regression

KW - near infrared spectra

KW - partial feast squares

KW - principal components regression

KW - stepwise regression

KW - PARTIAL LEAST-SQUARES

M3 - Article

VL - 29

SP - 1021

EP - 1037

JO - Communications in Statistics - Simulation and Computation

JF - Communications in Statistics - Simulation and Computation

SN - 0361-0918

ER -