Investigating the impact of database choice on the accuracy of metagenomic read classification for the rumen microbiome

Rebecca Louise Smith* (Corresponding Author), Laura Glendinning, Alan Walker, Mick Watson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Microbiome analysis is quickly moving towards high-throughput methods such as metagenomic sequencing. Accurate taxonomic classification of metagenomic data relies on reference sequence databases, and their associated taxonomy. However, for understudied environments such as the rumen microbiome many sequences will be derived from novel or uncultured microbes that are not
present in reference databases. As a result, taxonomic classification of metagenomic data from understudied environments may be inaccurate. To assess the accuracy of taxonomic read classification, this study classified metagenomic data that had been simulated from cultured rumen microbial genomes from the Hungate collection. To assess the impact of reference databases on the accuracy taxonomic classification, the data was classified with Kraken 2 using several reference databases. We found that the choice and composition of reference database significantly impacted on taxonomic classification results, and accuracy. In particular, NCBI RefSeq proved to be a poor choice of database. Our results indicate that inaccurate read classification is likely to be significant problem, affecting all studies that use insufficient reference databases. We observed that adding cultured reference genomes from the rumen to the reference database greatly improved classification rate and accuracy. We also demonstrated that metagenome-assembled genomes
(MAGs) have the potential to further enhance classification accuracy by representing uncultivated microbes, sequences of which would otherwise be unclassified or incorrectly classified. However, classification accuracy was strongly dependent on the taxonomic labels assigned to these MAGs. We therefore highlight the importance of accurate reference taxonomic information and suggest that, with formal taxonomic lineages, MAGs have the potential to improve classification rate and accuracy, particularly in environments such as the rumen that are understudied or contain many novel genomes.
Original languageEnglish
Article number57
JournalAnimal Microbiome
Volume4
Early online date18 Nov 2022
DOIs
Publication statusPublished - 18 Nov 2022

Keywords

  • Metagenome-assembled genomes
  • Metagenome
  • Rumen
  • Microbiome
  • Reference databases
  • Read classification
  • Taxonomy

Fingerprint

Dive into the research topics of 'Investigating the impact of database choice on the accuracy of metagenomic read classification for the rumen microbiome'. Together they form a unique fingerprint.

Cite this