TABLE 8

Pros and cons of analysis tools

AlgorithmInterface type(s)Pro(s)Con(s)
Assembly
    VelvetWeb basedDesigned for repeat-rich reads; automated parameter tuning for quality control; detailed tutorial; Web-based accessibilitySmall N50 contig size; technology specific; coverage cutoff excludes potentially correct low-coverage vertices; high memory usage; suitable for short reads only
    IDBA-UDCommand lineDesigned for repeat-rich short reads with various sequencing depths; among the lowest memory usages; error correction after each iteration for quality controlTechnology specific; no tutorial; suitable for short reads only
    RAYCommand lineHybrid assembly of multiple sequencing platform reads; heuristics for contig length determination that increase quality of sequence accuracy; automated parameter calculation; detailed tutorialSmall N50 contig size; poor performance with lower-quality reads; suitable for short reads only
    SPAdes/hybridSPAdesWeb basedHybrid assembly of multiple sequencing platform reads; suited for short and long reads; among the lowest memory usages; largest N50 contig size; closing of gaps and resolution of repeats in assembly graph for quality control; option to merge contigs from other assemblers; detailed tutorial; Web-based accessibilityLongest computing time
    Minimap/miniasmCommand lineShortest computing time; compatibility with other overlapping workflows when converted to PAF format; detailed tutorialTechnology specific; no sequencing error correction; missing overlaps and misassemblies during graph cleaning; suitable for long reads only
    CanuCommand lineLarge N50 contig size; detailed tutorial; initial read correction to remove noise for quality controlLong computing time; high memory usage; suitable for long reads only
Genome characterization
    Identification
        KmerFinderWeb basedNo bioinformatics skills required; easy to use; easy to interpret output; raw sequence or contig input; possible to detect contaminationMethod should be set properly; no assembly is performed
        NCBI BLASTWeb basedLargest database; multiple databases; multiple tools availableInterpretation of results can be difficult; some BLAST knowledge is advised
        MLST Web serverWeb basedSimple online workflow; no bioinformatics skills requiredSuitable for samples of single species only; accepts short reads only from Illumina, Roche 454, Ion Torrent, and SOLiD
        PathoScope 2.0Command lineAble to detect contamination; quality control of raw sequencing reads; complete workflow that minimizes the need for intense computational background; detailed and understandable tutorialWhen testing samples with multiple strains of one species, parsimony can lead to missing of strains due to reassignment; for nearly identical strains, a coverage of >20% is necessary to distinguish between them; long computing time
    Annotation
        RASTWeb basedWeb accessible; KEGG connection; graph presentationLong waiting times; must send data to server
        PROKKACommand lineShort computing time; parallel annotation with 5 tools in a single workflow; detailed tutorialDecreased annotation performance with understudied or draft genomes; suitable only for samples of single species
    Virulence
        VirulenceFinderWeb basedEasy to use; fast results; parameter control; raw sequence or contig inputNot able to detect SNP-related virulence; available for only limited groups of species/genera
        VFDBWeb basedExtended wealth of information; more markers associated with virulence than in VirulenceFinderFunction to detect virulence markers is not easy to use; not able to detect SNP-related virulence
    AMR
        ResFinderWeb basedFast results; parameter control; raw sequence or contig inputNot able to detect SNP-related resistance; not able to detect ampC
        RGI/CARDWeb basedAble to detect SNP-related resistance; accession no. input possible; raw sequence or contig input; access to antibiotic resistance ontology; BLAST present; graphical viewsLimited contig upload size (<20 Mb); no raw sequence data input possible
        PlasmidFinderWeb basedRaw sequence or contig inputLimited database; detects only plasmids and does not include the presence of AMR
        CGE BAPWeb basedComplete suite for genome characterization; easy to useNeed for subscription for access; long computing times; no annotation performed
Comparative genomics
    PubMLSTWeb basedCreates source for both MLST and cgMLST as other sets of genes used for typing; built on BIGSdb, which makes it locally installable; all databases can be downloaded; user is able to contribute to the databaseFinding correct data can be difficult; built to share data publically
    CSI Phylogeny 1.4Web basedRaw read and contig input possible; hqSNPs by selecting SNPs based on strict criteria; many parameters can be setOnly reference-based comparison; need to provide reference sequence; amt of parameters could be confusing for clinician without bioinformatics knowledge
    NDtree 1.2Web basedRaw read input, which makes it able to skip assembly; easy to use; automatic selection of best reference using KmerFinderMethod is not comparable to others; fixed parameters; lack of documentation; only reference-based comparison
    kSNP3Command lineVery fast method; automatically skips regions with high mutation frequency; easily scalable; all-to-all comparison possible; works with raw sequence data and/or contigs as inputCompared to other comparative genomics tools, overall accuracy is slightly low; no hqSNP method; bioinformatics knowledge needed
    RoaryCommand lineProtein misprediction control; detailed manual; construction of pangenomeInput has to be contigs; slow computation with larger sample sizes; relies fully on annotation accuracy
    Pan-SeqCommand line and Web basedMinimal user interaction needed; construction of pangenomeInput has to be contigs; no exptl data on computing speed and accuracy
    Lyve-SETCommand lineExtensive SNP filtering (hqSNP); implementation for running on a computing cluster is presentCan be too conservative in SNP calling; only reference-based comparison; bioinformatics knowledge needed
    SPANDxCommand lineExtensive error checking, filtering, and variant identification steps during quality control (hqSNP); complete workflow from raw reads to comparative analysis; quick variant visualization through automatically generated presence/absence matrixes and error-corrected SNP and indel matrixes; works with raw sequence data as inputOnly reference-based comparison; bioinformatics knowhow needed
Phylogeny
    RAxMLCommand lineEnables standard nonparametric bootstrapping, rapid bootstrapping, bootstopping, and calculation of SH-like support values for quality control; CAT and Shimodaira-Hasegawa test for quality control; comprehensive workflow; detailed manual; GTR model availableLongest computing time; highest accuracy; computationally expensive
    FastTreeCommand lineShortest computing time; CAT and Shimodaira-Hasegawa test for quality control; GTR model available; detailed manualLowest accuracy due to limited initial tree improvement
    MrBayesCommand linePossible to optimize a model; most models available for all phylogeny methods; detailed manual; GTR model availableInput and output formats in NEXUS; complex to use
Complete outbreak analysis software suites
    BioNumerics 7.6.2Local suiteEasy to use; custom schemes possible; scheme modification; wgMLST; cgMLST; rMLST; most schemes presentSeparate modules needed; no cluster types
    Ridom SeqSphere+Local suiteEasy to use; use of cluster types; ad hoc schemes possible; cgMLST; wgMLSTDatabase can be slow with many samples; fewer schemes available than with BioNumerics
    NCBI Pathogen Detection (beta)Web-based suiteFree to use; direct link to foodborne pathogen outbreaks; data sharing; uses collection of strainsRegistration needed; focus on foodborne pathogens; data are publically available; time-consuming to register new samples; not suitable for real-time hospital-acquired outbreaks