Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

Rhys A Farrer; Daniel A Henk; Dan MacLean; David J Studholme; Matthew C Fisher

doi:10.1038/srep01512

Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

Rhys A Farrer^* (Corresponding Author), Daniel A Henk, Dan MacLean, David J Studholme, Matthew C Fisher

^*Corresponding author for this work

Medical Sciences

Research output: Contribution to journal › Article › peer-review

31 Citations (Scopus)

9 Downloads (Pure)

Abstract

Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.

Original language	English
Article number	1512
Number of pages	6
Journal	Scientific Reports
Volume	3
DOIs	https://doi.org/10.1038/srep01512
Publication status	Published - 21 Mar 2013

Bibliographical note

Funding: R.A.F. was funded by the Natural Environment Research Council (NERC). D.A.H. and M.C.F. were supported by the Wellcome Trust. No additional external funding received for this study.

Access to Document

10.1038/srep01512Licence: CC BY-NC-ND

Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
Final published version, 667 KBLicence: CC BY-NC-ND

Cite this

@article{f73ed354685d4478a317296bac8830f9,

title = "Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects",

abstract = "Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.",

author = "Farrer, {Rhys A} and Henk, {Daniel A} and Dan MacLean and Studholme, {David J} and Fisher, {Matthew C}",

note = "Funding: R.A.F. was funded by the Natural Environment Research Council (NERC). D.A.H. and M.C.F. were supported by the Wellcome Trust. No additional external funding received for this study.",

year = "2013",

month = mar,

day = "21",

doi = "10.1038/srep01512",

language = "English",

volume = "3",

journal = "Scientific Reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

AU - Farrer, Rhys A

AU - Henk, Daniel A

AU - MacLean, Dan

AU - Studholme, David J

AU - Fisher, Matthew C

N1 - Funding: R.A.F. was funded by the Natural Environment Research Council (NERC). D.A.H. and M.C.F. were supported by the Wellcome Trust. No additional external funding received for this study.

PY - 2013/3/21

Y1 - 2013/3/21

N2 - Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.

AB - Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.

U2 - 10.1038/srep01512

DO - 10.1038/srep01512

M3 - Article

C2 - 23518929

SN - 2045-2322

VL - 3

JO - Scientific Reports

JF - Scientific Reports

M1 - 1512

ER -

Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

Abstract

Bibliographical note

Access to Document

Fingerprint

Cite this