Genomics Tools

BUSCO

Provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs.

Barrnap

Predict the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S).

Model Organisms

Bismark

Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion.

Epigenetics

Blockbuster

Detect blocks of overlapping reads using a gaussian-distribution approach

CANU

De-novo assembly tool for long read chemistry like Nanopore data and PacBio data.

Cactus

Cactus is a reference-free whole-genome multiple alignment program.

Ensembl VEP

Determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Function analysis

Freebayes

Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, multi-nucleotide polymorphisms, and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Rare diseases

Hapo-G

Hapo-G is a tool that aims to improve the quality of genome assemblies by polishing the consensus with accurate reads. It capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

HyPo

HyPo, a Hybrid Polisher, utilizes short as well as long reads within a single run to polish a long reads assembly of small and large genomes.

ISEScan

Automated identification of insertion sequence elements in prokaryotic genomes.

DNA Structural variation

LASTZ

A tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically.

MAGeCK

Computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology.

Genetics

MUMmer

MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Basically it is a ultra-fast alignment of large-scale DNA and protein sequences

Mapping

Sequencing

Maker

Portable and easily configurable genome annotation pipeline. It’s purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.

DNA

MinCED

MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as assembled contigs from metagenomes.

Sequence composition

complexity and repeats

Metagenomics

Nextclade

Nextclade is an open-source project for viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement.

Caldistics

ODGI

Optimized dynamic genome graph implementation: a toolkit for understanding pangenome graphs

DNA

Prokka

Software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.

Model Organisms

Roary

A high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome.

DNA

Mapping

Shasta

De novo assembly from Oxford Nanopore reads.

TransDecoder

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Gene transcripts

RNA-Seq

Vcflib

API and command line utilities for the manipulation of VCF files.

Data management

abacas

ABACAS is intended to rapidly contiguate (align, order, orientate) , visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. It uses MUMmer to find alignment positions and identify syntenies of assembly contigs against the reference. The output is then processed to generate a pseudomolecule taking overlaping contigs and gaps in to account. MUMmer's alignment generating programs, Nucmer and Promer are used followed by the 'delta-filter' utility function. Users could also run tblastx on contigs that are not used to generate the pseudomolecule.

abismal

Abismal is a mapper of FASTQ bisulfite-converted short reads (between 50 and 1000 bases) to a FASTA reference genome.

Epigenetics

abpoa

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band. abPOA can perform multiple sequence alignment (MSA) on a set of input sequences and generate a consensus sequence by applying the heaviest bundling algorithm to the final alignment graph.

abricate

Mass screening of contigs for antimicrobial resistance or virulence genes.

Microbiology

actc

ACTC (Align subreads to CCS reads) is developed by Pacific Biosciences and provides a one-click solution for aligning individual subreads to the corresponding circular consensus (CCS) reads — useful in workflows involving HiFi/CCS read analysis from PacBio sequencing.

Sequence alignment

adapterremoval

AdapterRemoval searches for and removes adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. AdapterRemoval can analyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, AdapterRemoval can construct a consensus adapter sequence for paired-ended reads, if which this information is not available.

advntr

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

Variant detection

afterqc

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.

Data quality management

agat

Another Gff Analysis Toolkit (AGAT) Suite of tools to handle gene annotations in any GTF/GFF format.

Sequence annotation

agfusion

AGFusion (pronounced 'A G Fusion') is a python package for annotating gene fusions from the human or mouse genomes.

Transcriptomics

RNA-Seq

alien_hunter

Alien_hunter is an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order Motifs (IVOMs).

alignstats

AlignStats produces various alignment, whole genome coverage, and capture coverage metrics for sequence alignment files in SAM, BAM, and CRAM format. This program is designed to serve reporting and quality control purposes in sequencing analysis pipelines at the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC).

Sequence alignment

amptk

AMPtk is a series of scripts to process NGS amplicon data using USEARCH and VSEARCH, it can also be used to process any NGS amplicon data and includes databases setup for analysis of fungal ITS, fungal LSU, bacterial 16S, and insect COI amplicons. It can handle Ion Torrent, MiSeq, and 454 data.

Next-generation Sequencing

anchorwave

AnchorWave (Anchored Wavefront Alignment) identifies collinear regions via conserved anchors (full-length CDS and full-length exon have been implemented currently) and breaks collinear regions into shorter fragments, i.e., anchor and inter-anchor intervals. By performing sensitive sequence alignment for each shorter interval via a 2-piece affine gap cost strategy and merging them together, AnchorWave generates a whole-genome alignment for each collinear block. AnchorWave implements commands to guide collinear block identification with or without chromosomal rearrangements and provides options to use known polyploidy levels or whole-genome duplications to inform alignment.

Computational Biology

any2fasta

Convert various sequence formats to FASTA

Data quality management

arcs

Scaffolding genome assemblies using 10X Genomics Chromium data or stLFR linked reads

bandage

GUI program that allows users to interact with the assembly graphs made by de novo assemblers such as Velvet, SPAdes, MEGAHIT and others. It visualises assembly graphs, with connections, using graph layout algorithms.

biobambam

Tools for early stage NGS alignment file processing including fast sorting and duplicate marking.

blasr

Software for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error.

Mapping

Sequencing

funannotate

funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes).

kofamscan

KofamScan is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool.

Structure analysis

lofreq

LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering.