De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads

Rhys A. Farrer, Eric Kemen, Jonathan D. G. Jones, David J. Studholme (Corresponding Author)

Research output: Contribution to journalArticle

71 Citations (Scopus)


Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. EDENA generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. SSAKE and VCAKE yielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 x deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.

Original languageEnglish
Pages (from-to)103-111
Number of pages9
JournalFEMS Microbiology Letters
Issue number1
Publication statusPublished - 1 Feb 2009



  • Chromosome Mapping/methods
  • Computational Biology
  • Genome, Bacterial
  • Pseudomonas syringae/genetics
  • Sequence Analysis, DNA
  • Software

Cite this