Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants

Reka Nagy, Thibaud S Boutin, Jonathan Marten, Jennifer E Huffman, Shona M Kerr, Archie Campbell, Louise Evenden, Jude Gibson, Carmen Amador, David M Howard, Pau Navarro, Andrew Morris, Ian J Deary, Lynne J Hocking, Sandosh Padmanabhan, Blair H Smith, Peter Joshi, James F Wilson, Nicholas D Hastie, Alan F WrightAndrew M McIntosh, David J Porteous, Chris S Haley, Veronique Vitart, Caroline Hayward

Research output: Contribution to journalArticle

34 Citations (Scopus)
3 Downloads (Pure)

Abstract

BACKGROUND: The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array.

METHODS: GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort.

RESULTS: Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08-1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9.

CONCLUSIONS: GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.

Original languageEnglish
Article number23
Pages (from-to)1-14
Number of pages14
JournalGenome Research
Volume9
DOIs
Publication statusPublished - 7 Mar 2017

Fingerprint

Genome-Wide Association Study
Scotland
Family Health
Haplotypes
Research
National Health Programs
Uric Acid
Gene Frequency
Genome
Phenotype
Aptitude
Electronic Health Records
Information Storage and Retrieval
Ambulatory Care
Population
Single Nucleotide Polymorphism
Volunteers
Fasting
Cohort Studies
Heart Rate

Keywords

  • Genome-wide association studies (GWAS)
  • Electronic health records
  • Imputation
  • Quantitative trait
  • Genetics
  • Urate
  • Heart rate
  • Glucose
  • Haplotype Research Consortium (HRC)

Cite this

Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants. / Nagy, Reka; Boutin, Thibaud S; Marten, Jonathan; Huffman, Jennifer E; Kerr, Shona M; Campbell, Archie; Evenden, Louise; Gibson, Jude; Amador, Carmen; Howard, David M; Navarro, Pau; Morris, Andrew; Deary, Ian J; Hocking, Lynne J; Padmanabhan, Sandosh; Smith, Blair H; Joshi, Peter; Wilson, James F; Hastie, Nicholas D; Wright, Alan F; McIntosh, Andrew M; Porteous, David J; Haley, Chris S; Vitart, Veronique; Hayward, Caroline (Corresponding Author).

In: Genome Research, Vol. 9, 23, 07.03.2017, p. 1-14.

Research output: Contribution to journalArticle

Nagy, R, Boutin, TS, Marten, J, Huffman, JE, Kerr, SM, Campbell, A, Evenden, L, Gibson, J, Amador, C, Howard, DM, Navarro, P, Morris, A, Deary, IJ, Hocking, LJ, Padmanabhan, S, Smith, BH, Joshi, P, Wilson, JF, Hastie, ND, Wright, AF, McIntosh, AM, Porteous, DJ, Haley, CS, Vitart, V & Hayward, C 2017, 'Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants', Genome Research, vol. 9, 23, pp. 1-14. https://doi.org/10.1186/s13073-017-0414-4
Nagy, Reka ; Boutin, Thibaud S ; Marten, Jonathan ; Huffman, Jennifer E ; Kerr, Shona M ; Campbell, Archie ; Evenden, Louise ; Gibson, Jude ; Amador, Carmen ; Howard, David M ; Navarro, Pau ; Morris, Andrew ; Deary, Ian J ; Hocking, Lynne J ; Padmanabhan, Sandosh ; Smith, Blair H ; Joshi, Peter ; Wilson, James F ; Hastie, Nicholas D ; Wright, Alan F ; McIntosh, Andrew M ; Porteous, David J ; Haley, Chris S ; Vitart, Veronique ; Hayward, Caroline. / Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants. In: Genome Research. 2017 ; Vol. 9. pp. 1-14.
@article{336abc4dd43a430cb0c91be9dd4da80f,
title = "Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants",
abstract = "BACKGROUND: The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array.METHODS: GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort.RESULTS: Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08-1{\%}). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9.CONCLUSIONS: GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.",
keywords = "Genome-wide association studies (GWAS), Electronic health records, Imputation, Quantitative trait, Genetics, Urate, Heart rate, Glucose, Haplotype Research Consortium (HRC)",
author = "Reka Nagy and Boutin, {Thibaud S} and Jonathan Marten and Huffman, {Jennifer E} and Kerr, {Shona M} and Archie Campbell and Louise Evenden and Jude Gibson and Carmen Amador and Howard, {David M} and Pau Navarro and Andrew Morris and Deary, {Ian J} and Hocking, {Lynne J} and Sandosh Padmanabhan and Smith, {Blair H} and Peter Joshi and Wilson, {James F} and Hastie, {Nicholas D} and Wright, {Alan F} and McIntosh, {Andrew M} and Porteous, {David J} and Haley, {Chris S} and Veronique Vitart and Caroline Hayward",
note = "Acknowledgements We are grateful to all the families who took part in the Generation Scotland: Scottish Family Health Study, the general practitioners and Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team, which includes academic researchers, IT staff, laboratory technicians, statisticians and research managers. We thank staff at the University of Dundee Health Informatics Centre for their expert assistance with EHR data linkage. IJD is supported by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (MR/K026992/1); funding from the BBSRC and MRC is gratefully acknowledged. Data on glycaemic traits have been contributed by MAGIC investigators and have been downloaded from www.magicinvestigators.org. Funding Genotyping of the GS:SFHS samples was carried out by the Edinburgh Clinical Research Facility, University of Edinburgh and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award ‘STratifying Resilience and Depression Longitudinally’ (STRADL) (Reference 104036/Z/14/Z). GS:SFHS received core support from the Scottish Executive Health Department, Chief Scientist Office, grant number CZD/16/6. The MRC provides core funding to the QTL in Health and Disease research program at the MRC HGU, IGMM, University of Edinburgh. Availability of data and materials The datasets supporting the conclusions of this article are included within the article (and its Additional files). https://figshare.com/collections/Exploration_of_haplotype_research_consortium_imputation_for_genome-wide_association_studies_in_20_032_Generation_Scotland_participants/3711706",
year = "2017",
month = "3",
day = "7",
doi = "10.1186/s13073-017-0414-4",
language = "English",
volume = "9",
pages = "1--14",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",

}

TY - JOUR

T1 - Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants

AU - Nagy, Reka

AU - Boutin, Thibaud S

AU - Marten, Jonathan

AU - Huffman, Jennifer E

AU - Kerr, Shona M

AU - Campbell, Archie

AU - Evenden, Louise

AU - Gibson, Jude

AU - Amador, Carmen

AU - Howard, David M

AU - Navarro, Pau

AU - Morris, Andrew

AU - Deary, Ian J

AU - Hocking, Lynne J

AU - Padmanabhan, Sandosh

AU - Smith, Blair H

AU - Joshi, Peter

AU - Wilson, James F

AU - Hastie, Nicholas D

AU - Wright, Alan F

AU - McIntosh, Andrew M

AU - Porteous, David J

AU - Haley, Chris S

AU - Vitart, Veronique

AU - Hayward, Caroline

N1 - Acknowledgements We are grateful to all the families who took part in the Generation Scotland: Scottish Family Health Study, the general practitioners and Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team, which includes academic researchers, IT staff, laboratory technicians, statisticians and research managers. We thank staff at the University of Dundee Health Informatics Centre for their expert assistance with EHR data linkage. IJD is supported by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (MR/K026992/1); funding from the BBSRC and MRC is gratefully acknowledged. Data on glycaemic traits have been contributed by MAGIC investigators and have been downloaded from www.magicinvestigators.org. Funding Genotyping of the GS:SFHS samples was carried out by the Edinburgh Clinical Research Facility, University of Edinburgh and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award ‘STratifying Resilience and Depression Longitudinally’ (STRADL) (Reference 104036/Z/14/Z). GS:SFHS received core support from the Scottish Executive Health Department, Chief Scientist Office, grant number CZD/16/6. The MRC provides core funding to the QTL in Health and Disease research program at the MRC HGU, IGMM, University of Edinburgh. Availability of data and materials The datasets supporting the conclusions of this article are included within the article (and its Additional files). https://figshare.com/collections/Exploration_of_haplotype_research_consortium_imputation_for_genome-wide_association_studies_in_20_032_Generation_Scotland_participants/3711706

PY - 2017/3/7

Y1 - 2017/3/7

N2 - BACKGROUND: The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array.METHODS: GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort.RESULTS: Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08-1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9.CONCLUSIONS: GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.

AB - BACKGROUND: The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array.METHODS: GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort.RESULTS: Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08-1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9.CONCLUSIONS: GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.

KW - Genome-wide association studies (GWAS)

KW - Electronic health records

KW - Imputation

KW - Quantitative trait

KW - Genetics

KW - Urate

KW - Heart rate

KW - Glucose

KW - Haplotype Research Consortium (HRC)

U2 - 10.1186/s13073-017-0414-4

DO - 10.1186/s13073-017-0414-4

M3 - Article

C2 - 28270201

VL - 9

SP - 1

EP - 14

JO - Genome Research

JF - Genome Research

SN - 1088-9051

M1 - 23

ER -