TABLE 4

Performance analysis of comparative genomics toolsa

Analysis tool (reference[s])ConceptMethodRun time (h)Topology score (%)Web address(es)Input type(s)Input format(s)Output format(s)
Web based
    PubMLST (158)Web-accessible database where it is possible to run cgMLST and wgMLST analysescgMLST/wgMLSTNANAhttps://pubmlst.org/ContigsFASTAcgMLST/wgMLST profile
    CSI Phylogeny 1.4 (161)High-quality SNP method using reference mapping of reads and mapping and SNP calling assessmentsReference-based SNPNDNDhttps://cge.cbs.dtu.dk/services/CSIPhylogeny/Raw sequences, contigsFASTA, FASTQND
    NDtree 1.2 (161)Creates k-mers of reads and maps them to a reference; performs simple model to determine no. of SNPsStatistical method3–3.5bNDhttps://cge.cbs.dtu.dk/services/NDtree/Raw sequencesFASTQNewick
Command line
    kSNP3 (154, 155)Uses k-mer analyses to detect SNPs between strains without using either multiple-sequence alignment or a reference genomeNon-reference-based SNP0.5c91.80–95.80c,ehttps://sourceforge.net/projects/ksnp/Raw sequences, contigsFASTANewick, MSA
    Roary (169)Tool for constructing pangenomes from contigsPangenome4.30d100dhttps://sanger-pathogens.github.io/Roary/ContigsGFF3FASTA, TXT, CSV, Rtab
    Pan-Seqf (175)Pangenome assembler with additional locus finder for core/accessory gene allele profiles (a Web-based version is also available)PangenomeNDNDhttps://github.com/chadlaing/Panseq, https://lfz.corefacility.ca/panseq/ContigsFASTATXT, FASTA
    Lyve-SET (179)High-quality SNP method using reference mapping of reads and mapping and SNP calling assessmentsReference-based SNP6.25c85chttps://github.com/lskatz/lyve-SETRaw sequences, contigsgFASTA, FASTQMatrix, FASTA, Newick, VCF
    SPANDx (182)Complete workflow for creating SNP/indel matrixes as well as locus presence/absence matrixes from raw sequencing reads from a range of NGS technologiesReference-based SNP3.1c100chttps://sourceforge.net/projects/spandx/Raw sequencesFASTA, FASTQNEXUS
  • a All quantitative performance measures were taken from previously reported data, as indicated. ND, no data; NA, not applicable; MSA, multiple-sequence alignment; GFF3, General Feature Format 3; VCF, variant call format.

  • b Based on 46 VTEC genomes (20).

  • c Based on 21 E. coli genomes (167).

  • d Wall time for 1,000 S. enterica serovar Typhi genomes (169).

  • e Using core.

  • f A Web-based version is also available.

  • g Contigs are simulated to reads.