Performance analysis of assembly toolsa

Analysis tool (reference[s])ConceptComputational requirementSpeedAssembly qualityPreferred sequencing technology(ies)Web address(es)Input formatOutput format(s)
Web based
    Velvet (103, 126)de Bruijn graph-based assembly that resolves repeat-rich regions; can be used for de novo or reference-guided assembly; requires paired reads with 20- to 25-fold coverageMid*Medium*Low*Illumina, FASTQ, SAM, or BAMAMOS, modified FASTA
    SPAdes/hybridSPAdes (112)de Bruijn graph-based assembler for de novo assembly of short and long readsLow**Low**Mid*/**Mixed input (Illumina, Ion Torrent, PacBio CLR, Oxford Nanopore), FASTQ, or BAMFASTA, FASTQ, FASTG
Command line
    IDBA-UD (108)de Bruijn graph-based assembly designed for assembly of repeat-rich reads of various sequencing depthsLow*Medium*Mid*Illumina
    RAY (96)de Bruijn graph-based assembly that uses seeds instead of Eulerian walks; used for de novo assembly; designed for short readsLow***Fast***Low***Mixed input (454, Illumina, Ion Torrent), FASTQ, or SFFFASTA, TXT
    Minimap/miniasm (116)OLC framework that computes overlaps and performs read trims and unitig construction; can be used for de novo or reference-guided assemblyLow**High**High*/**PacBio, Oxford Nanopore,, PAF
    Canu (118)OLC framework that computes overlaps and performs read correction, read trims, and unitig construction; used for de novo assemblyMid**Low**High*/**PacBio, Oxford Nanopore or FASTQFASTA
  • a All quantitative performance measures were taken from data reported previously, as indicated. CLR, continuous long reads; GFA, graphical fragment assembly; PAF, pairwise mapping format; SFF, standard flowgram format (454 data format); *, E. coli K-12 MG1655 data set (110); **, Enterobacter kobei data set (233); ***, Illumina data from E. coli (SRA accession number SRX000429) (234). Note that for SPAdes, only the nonhybrid tool is accessible as a Web-based tool.