Skip to main content
  • ASM
    • Antimicrobial Agents and Chemotheraphy
    • Applied and Environmental Mircobiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Latest Articles
    • Archive
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Ethics Resources and Policies
  • About the Journal
    • About CMR
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
  • ASM
    • Antimicrobial Agents and Chemotheraphy
    • Applied and Environmental Mircobiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
Clinical Microbiology Reviews
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Latest Articles
    • Archive
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Ethics Resources and Policies
  • About the Journal
    • About CMR
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
Review

Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis

Scott Quainoo, Jordy P. M. Coolen, Sacha A. F. T. van Hijum, Martijn A. Huynen, Willem J. G. Melchers, Willem van Schaik, Heiman F. L. Wertheim
Scott Quainoo
Department of Microbiology, Radboud University, Nijmegen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jordy P. M. Coolen
Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sacha A. F. T. van Hijum
Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The NetherlandsNIZO, Ede, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martijn A. Huynen
Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Willem J. G. Melchers
Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Willem van Schaik
Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heiman F. L. Wertheim
Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/CMR.00016-17
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Figure1
    • Open in new tab
    • Download powerpoint
  • Figure2
    • Open in new tab
    • Download powerpoint
  • Figure3
    • Open in new tab
    • Download powerpoint
  • Figure4
    • Open in new tab
    • Download powerpoint
  • Figure5
    • Open in new tab
    • Download powerpoint
  • Figure6
    • Open in new tab
    • Download powerpoint
  • Figure7
    • Open in new tab
    • Download powerpoint
  • FIG 1
    • Open in new tab
    • Download powerpoint
    FIG 1

    Simplified SCS construction.

  • FIG 2
    • Open in new tab
    • Download powerpoint
    FIG 2

    Simplified k-mer construction during de Bruijn graph assembly.

  • FIG 3
    • Open in new tab
    • Download powerpoint
    FIG 3

    WGS outbreak analysis tools. Different steps in the analysis of WGS data are shown in orange (assembly, genomic characterization, comparative genomics, and phylogeny). Analysis tools are grouped by the analysis step that they perform and are separated by user interface in shades of blue (complete analysis software suites, Web based, and command line).

Tables

  • Figures
  • TABLE 1

    Performance analysis of sequencing platformsa

    PlatformRead length (bp)Output (Gb)CoveragebRun time (h)No. of readsCost per Gb ($)Consumables cost ($)Instrument cost ($)Error rateDimensions (width × depth × ht) (cm)Source(s) (reference[s])
    Sequencing by synthesis
        Illumina MiniSeq Mid Output2 × 150c2.1–2.4c8.617c14 million–16 millionc2,584–2,953d6,201c55,411c0.1% in >80% of base callsc45.6 × 48 × 51.8cIllumina
        Illumina MiniSeq High Output1 × 75c1.7–1.9c6.87c22 million–25 millionc3,264–3,648d6,201c55,411c0.1% in >85% of base callsc45.6 × 48 × 51.8cIllumina
    2 × 75c3.3–3.8c13.613c44 million–50 millionc1,632–1,879d6,201c55,411c0.1% in >85% of base callsc45.6 × 48 × 51.8cIllumina
    2 × 150c6.6–7.5c26.924c44 million–50 millionc827–940d6,201c55,411c0.1% in >80% of base callsc45.6 × 48 × 51.8cIllumina
        Illumina MiSeq Reagent kit v21 × 36c0.54–0.61c2.24c12 million–15 millionc7,946–8,976d4,847c108,244c0.1% in >90% of base callsc68.6 × 56.5 × 52.3cIllumina
    2 × 25c0.75–0.85c3.15.5c24 million–30 millionc5,702–6,463d4,847c108,244c0.1% in >90% of base callsc68.6 × 56.5 × 52.3cIllumina
    2 × 150c4.5–5.1c18.324c24 million–30 millionc950–1,077d4,847c108,244c0.1% in >80% of base callsc68.6 × 56.5 × 52.3cIllumina
    2 × 250c7.5–8.5c30.539c24 million–30 millionc570–646d4,847c108,244c0.1% in >75% of base callsc68.6 × 56.5 × 52.3cIllumina
        Illumina MiSeq Reagent kit v32 × 75c3.3–3.8c13.621c44 million–50 millionc1,362–1,568d5,174c108,244c0.1% in >85% of base callsc68.6 × 56.5 × 52.3cIllumina
    2 × 300c13.2–15c53.856c44 million–50 millionc345–392d5,174c108,244c0.1% in >70% of base callsc68.6 × 56.5 × 52.3cIllumina
        Illumina NextSeq 500 Mid Output2 × 75c16.3–20c71.815c<260 millionc318–391d6,369c266,835c0.1% in >75% of base callsc53.3 × 63.5 × 58.4cIllumina
    2 × 150c32.5–39c14026c<260 millionc163–196d6,369c266,835c0.1% in >80% of base callsc53.3 × 63.5 × 58.4cIllumina
        Illumina NextSeq 500 High Output1 × 75c25–30c107.711c<400 millionc312–374d9,347c266,835c0.1% in >80% of base callsc53.3 × 63.5 × 58.4cIllumina
    2 × 75c50–60c215.318c<800 millionc156–187d9,347c266,835c0.1% in >80% of base callsc53.3 × 63.5 × 58.4cIllumina
    2 × 150c100–120c430.629c<800 millionc78–93d9,347c266,835c0.1% in >75% of base callsc53.3 × 63.5 × 58.4cIllumina
    Single-molecule real-time sequencing
        Pacific Biosciences RS II P6-C4 chemistry>20,000c8–16e57.40.5–4c55,000c250–500d4,000e695,00014% errors per base203.0 × 90.0 × 160.0cPacBio, AllSeqf (89)
        Pacific Biosciences Sequel system>20,000c80–160e574.20.5–6c370,000c70–140d11,200e350,00014% errors per base92.7 × 86.4 × 167.6cPacBio, AllSeqf (89)
        Oxford Nanopore MinION Mk1 (1D)>882,00010–20c71.81.67–72c138,00049.95–99.9d999c1,000c12% errors per base10.5 × 3.3 × 2.3cOxford Nanopore Technologies, Loman Labsg (231)
        Oxford Nanopore MinION Mk1 (2D)>882,00010–20c71.81.67–72c138,00049.95–99.9d999c1,000c15% errors per base10.5 × 3.3 × 2.3cOxford Nanopore Technologies, Loman Labsg (231, 232)
        Oxford Nanopore PromethION single flow cell<300,000c233c836.21.67–>72c26 millioncNANA135,000 (PEAP)cNA44.0 × 24.0 × 40.0cOxford Nanopore Technologies
        Oxford Nanopore PromethION 48 flow cells<300,000c11,000c3,9475.81.67–>72c1.25 billioncNANA135,000 (PEAP)cNA44.0 × 24.0 × 40.0cOxford Nanopore Technologies
    • ↵a All quantitative performance measures were taken from previously reported data, as indicated. Consumables costs were calculated as follows: Illumina costs included PhiX Control kit v3, the Nextera XT DNA sample preparation kit (96 samples)/Nextera DNA library preparation kit (96 samples), and Nextera XT Index kit v2 (96 indexes and 384 samples), the highest-output reagent kit. PEAP, PromethION Early-Access Program; NA, no data available.

    • ↵b Calculated for 96 samples and the genome size of S. aureus strain MRSA252 (2,902,619 bp).

    • ↵c Manufacturer's data.

    • ↵d Estimated calculation for consumables.

    • ↵e For 16 SMRT cells.

    • ↵f See http://www.allseq.com/knowledge-bank/sequencing-platforms/pacific-biosciences/.

    • ↵g See http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/.

  • TABLE 2

    Performance analysis of assembly toolsa

    Analysis tool (reference[s])ConceptComputational requirementSpeedAssembly qualityPreferred sequencing technology(ies)Web address(es)Input formatOutput format(s)
    Web based
        Velvet (103, 126)de Bruijn graph-based assembly that resolves repeat-rich regions; can be used for de novo or reference-guided assembly; requires paired reads with 20- to 25-fold coverageMid*Medium*Low*Illuminahttps://cge.cbs.dtu.dk/services/Assembler/FASTA, FASTQ, SAM, or BAMAMOS, modified FASTA
        SPAdes/hybridSPAdes (112)de Bruijn graph-based assembler for de novo assembly of short and long readsLow**Low**Mid*/**Mixed input (Illumina, Ion Torrent, PacBio CLR, Oxford Nanopore)https://cge.cbs.dtu.dk/services/SPAdes/FASTA, FASTQ, or BAMFASTA, FASTQ, FASTG
    Command line
        IDBA-UD (108)de Bruijn graph-based assembly designed for assembly of repeat-rich reads of various sequencing depthsLow*Medium*Mid*Illuminahttp://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/FASTAFASTA
        RAY (96)de Bruijn graph-based assembly that uses seeds instead of Eulerian walks; used for de novo assembly; designed for short readsLow***Fast***Low***Mixed input (454, Illumina, Ion Torrent)http://denovoassembler.sourceforge.net/FASTA, FASTQ, or SFFFASTA, TXT
        Minimap/miniasm (116)OLC framework that computes overlaps and performs read trims and unitig construction; can be used for de novo or reference-guided assemblyLow**High**High*/**PacBio, Oxford Nanoporehttps://github.com/lh3/minimap, https://github.com/lh3/miniasmFASTAGFA, PAF
        Canu (118)OLC framework that computes overlaps and performs read correction, read trims, and unitig construction; used for de novo assemblyMid**Low**High*/**PacBio, Oxford Nanoporehttps://github.com/marbl/canuFASTA or FASTQFASTA
    • ↵a All quantitative performance measures were taken from data reported previously, as indicated. CLR, continuous long reads; GFA, graphical fragment assembly; PAF, pairwise mapping format; SFF, standard flowgram format (454 data format); *, E. coli K-12 MG1655 data set (110); **, Enterobacter kobei data set (233); ***, Illumina data from E. coli (SRA accession number SRX000429) (234). Note that for SPAdes, only the nonhybrid tool is accessible as a Web-based tool.

  • TABLE 3

    Overview of genome characterization toolsa

    Analysis tool (reference[s])Concept(s)Input type(s)Input format(s)Output format(s)Web address
    Identification
        Web based
            KmerFinder (121, 122)Uses k-mers to identify strain using WGS dataRaw sequences, contigsFASTQ, FASTATab delimited, onlinehttps://cge.cbs.dtu.dk/services/KmerFinder/
            NCBI BLASTb (123)NCBI Web-based interface for performing BLAST searches; searches hits in the database that match the given sequenceContigsFASTAOnline, tab delimitedhttps://blast.ncbi.nlm.nih.gov/Blast.cgi
            MLST Web server (125)Web-based database that identifies STs from short sequencing reads or draft genomesRaw sequences, contigsFASTQ, FASTAOnlinehttps://cge.cbs.dtu.dk/services/MLST/
        Command line
            PathoScope 2.0 (127)Complete framework based on Bayesian missing-data approach, for direct strain identificationRaw sequencesFASTQ, FASTATab delimitedhttps://sourceforge.net/p/pathoscope/wiki/Home/
    Annotation
        Web based
            RAST (129)Web-based server for localization and identification of tRNA, rRNA, and coding sequences; includes a browser for screening the outputContigsFASTAGenBank, EMBL, GFF3, GTF, Excel, and tab delimitedhttp://rast.nmpdr.org/
        Command line
            PROKKA (132)Rapid annotation tool for localization and identification of rRNA, tRNA, tmRNA, signal peptides, noncoding RNA, and coding sequencesContigsFASTAFASTA, tab delimited, SQN, GenBank file, GFF3http://www.vicbioinformatics.com/software.prokka.shtml
    Virulence
        Web based
            VirulenceFinderDetect virulence genes in WGS data using the BLAST algorithmRaw sequences, contigsFASTQ, FASTATab-delimited summary, FASTAhttps://cge.cbs.dtu.dk/services/VirulenceFinder/
            VFDB (138)Source of virulence information, including Web-based service to perform BLAST to detect virulence genesContigsFASTAOnline, tab delimitedhttp://www.mgc.ac.cn/VFs/
    Antimicrobial resistance
        Web based
            ResFinderDetects resistance genes in WGS dataRaw sequences, contigsFASTQ, FASTATab-delimited summary, FASTAhttps://cge.cbs.dtu.dk/services/ResFinder/
            RGI/CARD (144–146)Web-based as well as command line versions available to perform resistance gene detection using the CARD databaseContigs, GenBank accession no.FASTA, GenBank accession no. (nucleotide or protein)JSON, tab-delimited summary, FASTA, heat map PDFhttps://card.mcmaster.ca/analyze/rgi
            PlasmidFinderTool to detect plasmids in WGS dataRaw sequences, contigsFASTQ, FASTATab-delimited summary, FASTAhttps://cge.cbs.dtu.dk/services/PlasmidFinder/
            CGE BAP (107)Web-based suite for automated genomic characterization; if raw sequence reads are provided, performs assembly; set of tools is applied to the contigs, ResFinder, VirulenceFinder, and PlasmidFinderRaw sequences, contigsFASTQ, FASTATab-delimited summaries, FASTAhttps://cge.cbs.dtu.dk/services/cge/
    • ↵a ND, no data; NA, not applicable; EMBL, sequence file format; JSON, JavaScript Object Notation; SQN, GenBank submission file; GFF3, General Feature Format 3.

    • ↵b Also available as a command line tool and as GUI via prfectBLAST (124).

  • TABLE 4

    Performance analysis of comparative genomics toolsa

    Analysis tool (reference[s])ConceptMethodRun time (h)Topology score (%)Web address(es)Input type(s)Input format(s)Output format(s)
    Web based
        PubMLST (158)Web-accessible database where it is possible to run cgMLST and wgMLST analysescgMLST/wgMLSTNANAhttps://pubmlst.org/ContigsFASTAcgMLST/wgMLST profile
        CSI Phylogeny 1.4 (161)High-quality SNP method using reference mapping of reads and mapping and SNP calling assessmentsReference-based SNPNDNDhttps://cge.cbs.dtu.dk/services/CSIPhylogeny/Raw sequences, contigsFASTA, FASTQND
        NDtree 1.2 (161)Creates k-mers of reads and maps them to a reference; performs simple model to determine no. of SNPsStatistical method3–3.5bNDhttps://cge.cbs.dtu.dk/services/NDtree/Raw sequencesFASTQNewick
    Command line
        kSNP3 (154, 155)Uses k-mer analyses to detect SNPs between strains without using either multiple-sequence alignment or a reference genomeNon-reference-based SNP0.5c91.80–95.80c,ehttps://sourceforge.net/projects/ksnp/Raw sequences, contigsFASTANewick, MSA
        Roary (169)Tool for constructing pangenomes from contigsPangenome4.30d100dhttps://sanger-pathogens.github.io/Roary/ContigsGFF3FASTA, TXT, CSV, Rtab
        Pan-Seqf (175)Pangenome assembler with additional locus finder for core/accessory gene allele profiles (a Web-based version is also available)PangenomeNDNDhttps://github.com/chadlaing/Panseq, https://lfz.corefacility.ca/panseq/ContigsFASTATXT, FASTA
        Lyve-SET (179)High-quality SNP method using reference mapping of reads and mapping and SNP calling assessmentsReference-based SNP6.25c85chttps://github.com/lskatz/lyve-SETRaw sequences, contigsgFASTA, FASTQMatrix, FASTA, Newick, VCF
        SPANDx (182)Complete workflow for creating SNP/indel matrixes as well as locus presence/absence matrixes from raw sequencing reads from a range of NGS technologiesReference-based SNP3.1c100chttps://sourceforge.net/projects/spandx/Raw sequencesFASTA, FASTQNEXUS
    • ↵a All quantitative performance measures were taken from previously reported data, as indicated. ND, no data; NA, not applicable; MSA, multiple-sequence alignment; GFF3, General Feature Format 3; VCF, variant call format.

    • ↵b Based on 46 VTEC genomes (20).

    • ↵c Based on 21 E. coli genomes (167).

    • ↵d Wall time for 1,000 S. enterica serovar Typhi genomes (169).

    • ↵e Using core.

    • ↵f A Web-based version is also available.

    • ↵g Contigs are simulated to reads.

  • TABLE 5

    Performance analysis of phylogeny toolsa

    Command line analysis tool (reference)ConceptRun time (h)Accuracy (%)Input formatOutput format
    RAxML (191)Maximum likelihood phylogenetic tree estimator tool; slow but very accurate612b84.47cPHYLIP or FASTANewick
    FastTree (163)Approximately maximum likelihood phylogenetic tree estimator; fast but slightly less accurate2.63b83.6cPHYLIP or FASTANewick
    MrBayes (198)Bayesian-based phylogenetic tree; complex to define models and not user-friendlyNDNDNEXUSNEXUS
    • ↵a All quantitative performance measures were taken from previously reported data, as indicated. The input type for all of these tools is aligned reads/SNPs. ND, no data.

    • ↵b Averages for 3 large biological data sets aligned via 3 different methods (TrueAln, PartTree, and Quicktree) (197).

    • ↵c Accuracy = 100% − missing branch rates (%) for 3 large biological data sets aligned via 3 different methods (TrueAln, PartTree, and Quicktree) (197).

  • TABLE 6

    Overview of complete analysis software suitesa

    Software suiteConceptRAM compatibility (Gb)Run time (h)No. of schemesPrice ($)Source or Web addressInput formatOutput format(s)
    Commercial
        BioNumerics 7.6.2Suite containing multiple modules, thereby having many functionalities; able to perform wgMLSTNDND14Request quotebApplied MathsFASTQDepends on module
        Ridom SeqSphere+Suite dedicated to outbreak analyses; customizable automation flows for processing raw reads to phylogeny using either cgMLST or wgMLST; for cgMLST, it includes CT definitions16–32cND72,500dRidom BioinformaticsFASTQCT, phylogeny
    Free
        NCBI Pathogen Detection (beta)NCBI-provided Web service with main focus on detection of foodborne pathogens; automated flow from raw sequences to phylogeny inferenceNANA19eFreehttps://www.ncbi.nlm.nih.gov/pathogens/FASTQWeb-accessible SNP tree, AMR data
    • ↵a ND, no data; NA, not applicable.

    • ↵b Cost needs to be requested and is dependent on the number of modules.

    • ↵c According to the manufacturer.

    • ↵d One-year, 2-user accounts (academic/governmental).

    • ↵e Numbers of species and groups of species, as no schemes apply.

  • TABLE 7

    Pros and cons of sequencing platforms

    PlatformProsCons
    Sequencing by synthesis
        IlluminaTechnology used widely by the WGS industry; lowest per-Gb sequencing cost range; highest confirmed output; wide range of Illumina machines suited for a wealth of applications and demands; lowest error ratesRehybridization of template strands and low-copy-no. yields during bridge amplification; use of potentially biased DNA polymerases during bridge amplification; incomplete base extension (phasing, prephasing); shortest read lengths; long sequence runs; high instrument costs; no real-time data access
    Single-molecule real-time sequencing
        Pacific BiosciencesFast sequence runs; long reads suitable for assembly of draft genomes and completion of genome assemblies; possibility of obtaining epigenetic sequence information; real-time measurement of base incorporationPossibility of false detection of unincorporated nucleotides during sequencing; largest instrument footprint; low output per run; high error rates
        Oxford Nanopore TechnologiesFast sequencing; longest confirmed reads; smallest instrument footprint; lowest instrument and consumables costs; real-time measurement of base incorporation; real-time data outputSensitivity of biological nanopores to changes in exptl environment; highest error rate of all platforms; the performance of the PromethION machine is not experimentally validated
  • TABLE 8

    Pros and cons of analysis tools

    AlgorithmInterface type(s)Pro(s)Con(s)
    Assembly
        VelvetWeb basedDesigned for repeat-rich reads; automated parameter tuning for quality control; detailed tutorial; Web-based accessibilitySmall N50 contig size; technology specific; coverage cutoff excludes potentially correct low-coverage vertices; high memory usage; suitable for short reads only
        IDBA-UDCommand lineDesigned for repeat-rich short reads with various sequencing depths; among the lowest memory usages; error correction after each iteration for quality controlTechnology specific; no tutorial; suitable for short reads only
        RAYCommand lineHybrid assembly of multiple sequencing platform reads; heuristics for contig length determination that increase quality of sequence accuracy; automated parameter calculation; detailed tutorialSmall N50 contig size; poor performance with lower-quality reads; suitable for short reads only
        SPAdes/hybridSPAdesWeb basedHybrid assembly of multiple sequencing platform reads; suited for short and long reads; among the lowest memory usages; largest N50 contig size; closing of gaps and resolution of repeats in assembly graph for quality control; option to merge contigs from other assemblers; detailed tutorial; Web-based accessibilityLongest computing time
        Minimap/miniasmCommand lineShortest computing time; compatibility with other overlapping workflows when converted to PAF format; detailed tutorialTechnology specific; no sequencing error correction; missing overlaps and misassemblies during graph cleaning; suitable for long reads only
        CanuCommand lineLarge N50 contig size; detailed tutorial; initial read correction to remove noise for quality controlLong computing time; high memory usage; suitable for long reads only
    Genome characterization
        Identification
            KmerFinderWeb basedNo bioinformatics skills required; easy to use; easy to interpret output; raw sequence or contig input; possible to detect contaminationMethod should be set properly; no assembly is performed
            NCBI BLASTWeb basedLargest database; multiple databases; multiple tools availableInterpretation of results can be difficult; some BLAST knowledge is advised
            MLST Web serverWeb basedSimple online workflow; no bioinformatics skills requiredSuitable for samples of single species only; accepts short reads only from Illumina, Roche 454, Ion Torrent, and SOLiD
            PathoScope 2.0Command lineAble to detect contamination; quality control of raw sequencing reads; complete workflow that minimizes the need for intense computational background; detailed and understandable tutorialWhen testing samples with multiple strains of one species, parsimony can lead to missing of strains due to reassignment; for nearly identical strains, a coverage of >20% is necessary to distinguish between them; long computing time
        Annotation
            RASTWeb basedWeb accessible; KEGG connection; graph presentationLong waiting times; must send data to server
            PROKKACommand lineShort computing time; parallel annotation with 5 tools in a single workflow; detailed tutorialDecreased annotation performance with understudied or draft genomes; suitable only for samples of single species
        Virulence
            VirulenceFinderWeb basedEasy to use; fast results; parameter control; raw sequence or contig inputNot able to detect SNP-related virulence; available for only limited groups of species/genera
            VFDBWeb basedExtended wealth of information; more markers associated with virulence than in VirulenceFinderFunction to detect virulence markers is not easy to use; not able to detect SNP-related virulence
        AMR
            ResFinderWeb basedFast results; parameter control; raw sequence or contig inputNot able to detect SNP-related resistance; not able to detect ampC
            RGI/CARDWeb basedAble to detect SNP-related resistance; accession no. input possible; raw sequence or contig input; access to antibiotic resistance ontology; BLAST present; graphical viewsLimited contig upload size (<20 Mb); no raw sequence data input possible
            PlasmidFinderWeb basedRaw sequence or contig inputLimited database; detects only plasmids and does not include the presence of AMR
            CGE BAPWeb basedComplete suite for genome characterization; easy to useNeed for subscription for access; long computing times; no annotation performed
    Comparative genomics
        PubMLSTWeb basedCreates source for both MLST and cgMLST as other sets of genes used for typing; built on BIGSdb, which makes it locally installable; all databases can be downloaded; user is able to contribute to the databaseFinding correct data can be difficult; built to share data publically
        CSI Phylogeny 1.4Web basedRaw read and contig input possible; hqSNPs by selecting SNPs based on strict criteria; many parameters can be setOnly reference-based comparison; need to provide reference sequence; amt of parameters could be confusing for clinician without bioinformatics knowledge
        NDtree 1.2Web basedRaw read input, which makes it able to skip assembly; easy to use; automatic selection of best reference using KmerFinderMethod is not comparable to others; fixed parameters; lack of documentation; only reference-based comparison
        kSNP3Command lineVery fast method; automatically skips regions with high mutation frequency; easily scalable; all-to-all comparison possible; works with raw sequence data and/or contigs as inputCompared to other comparative genomics tools, overall accuracy is slightly low; no hqSNP method; bioinformatics knowledge needed
        RoaryCommand lineProtein misprediction control; detailed manual; construction of pangenomeInput has to be contigs; slow computation with larger sample sizes; relies fully on annotation accuracy
        Pan-SeqCommand line and Web basedMinimal user interaction needed; construction of pangenomeInput has to be contigs; no exptl data on computing speed and accuracy
        Lyve-SETCommand lineExtensive SNP filtering (hqSNP); implementation for running on a computing cluster is presentCan be too conservative in SNP calling; only reference-based comparison; bioinformatics knowledge needed
        SPANDxCommand lineExtensive error checking, filtering, and variant identification steps during quality control (hqSNP); complete workflow from raw reads to comparative analysis; quick variant visualization through automatically generated presence/absence matrixes and error-corrected SNP and indel matrixes; works with raw sequence data as inputOnly reference-based comparison; bioinformatics knowhow needed
    Phylogeny
        RAxMLCommand lineEnables standard nonparametric bootstrapping, rapid bootstrapping, bootstopping, and calculation of SH-like support values for quality control; CAT and Shimodaira-Hasegawa test for quality control; comprehensive workflow; detailed manual; GTR model availableLongest computing time; highest accuracy; computationally expensive
        FastTreeCommand lineShortest computing time; CAT and Shimodaira-Hasegawa test for quality control; GTR model available; detailed manualLowest accuracy due to limited initial tree improvement
        MrBayesCommand linePossible to optimize a model; most models available for all phylogeny methods; detailed manual; GTR model availableInput and output formats in NEXUS; complex to use
    Complete outbreak analysis software suites
        BioNumerics 7.6.2Local suiteEasy to use; custom schemes possible; scheme modification; wgMLST; cgMLST; rMLST; most schemes presentSeparate modules needed; no cluster types
        Ridom SeqSphere+Local suiteEasy to use; use of cluster types; ad hoc schemes possible; cgMLST; wgMLSTDatabase can be slow with many samples; fewer schemes available than with BioNumerics
        NCBI Pathogen Detection (beta)Web-based suiteFree to use; direct link to foodborne pathogen outbreaks; data sharing; uses collection of strainsRegistration needed; focus on foodborne pathogens; data are publically available; time-consuming to register new samples; not suitable for real-time hospital-acquired outbreaks
PreviousNext
Back to top
Download PDF
Citation Tools
Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis
Scott Quainoo, Jordy P. M. Coolen, Sacha A. F. T. van Hijum, Martijn A. Huynen, Willem J. G. Melchers, Willem van Schaik, Heiman F. L. Wertheim
Clinical Microbiology Reviews Aug 2017, 30 (4) 1015-1063; DOI: 10.1128/CMR.00016-17

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print

Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this Clinical Microbiology Reviews article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis
(Your Name) has forwarded a page to you from Clinical Microbiology Reviews
(Your Name) thought you would be interested in this article in Clinical Microbiology Reviews.
Share
Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis
Scott Quainoo, Jordy P. M. Coolen, Sacha A. F. T. van Hijum, Martijn A. Huynen, Willem J. G. Melchers, Willem van Schaik, Heiman F. L. Wertheim
Clinical Microbiology Reviews Aug 2017, 30 (4) 1015-1063; DOI: 10.1128/CMR.00016-17
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • SUMMARY
    • INTRODUCTION
    • OUTBREAK DEFINITION
    • CONVENTIONAL MOLECULAR CHARACTERIZATION METHODS
    • NEED FOR WGS FOR OUTBREAK ANALYSIS
    • METHODS
    • SEQUENCING TECHNOLOGIES
    • WGS OUTBREAK ANALYSIS TOOLS
    • DISCUSSION
    • CLOSING REMARKS
    • ACKNOWLEDGMENTS
    • REFERENCES
    • Author Bios
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

bioinformatics
intensive care units
next-generation sequencing
nosocomial infections
outbreak analysis
outbreak management
pathogen surveillance
point of care
whole-genome sequencing

Related Articles

Cited By...

About

  • About CMR
  • Editor in Chief
  • Editorial Board
  • Policies
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Ethics
  • Contact Us

Follow #ClinMicroRev

@ASMicrobiology

       

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

Copyright © 2019 American Society for Microbiology | Privacy Policy | Website feedback

Print ISSN: 0893-8512; Online ISSN: 1098-6618