Bioinformatics Tools

A tool that finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

COSG

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

abacas

ABACAS is intended to rapidly contiguate (align, order, orientate) , visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. It uses MUMmer to find alignment positions and identify syntenies of assembly contigs against the reference. The output is then processed to generate a pseudomolecule taking overlaping contigs and gaps in to account. MUMmer's alignment generating programs, Nucmer and Promer are used followed by the 'delta-filter' utility function. Users could also run tblastx on contigs that are not used to generate the pseudomolecule.

Sequence assembly

abismal

Abismal is a mapper of FASTQ bisulfite-converted short reads (between 50 and 1000 bases) to a FASTA reference genome.

Epigenetics

abpoa

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band. abPOA can perform multiple sequence alignment (MSA) on a set of input sequences and generate a consensus sequence by applying the heaviest bundling algorithm to the final alignment graph.

actc

ACTC (Align subreads to CCS reads) is developed by Pacific Biosciences and provides a one-click solution for aligning individual subreads to the corresponding circular consensus (CCS) reads — useful in workflows involving HiFi/CCS read analysis from PacBio sequencing.

Sequence alignment

adapterremoval

AdapterRemoval searches for and removes adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. AdapterRemoval can analyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, AdapterRemoval can construct a consensus adapter sequence for paired-ended reads, if which this information is not available.

afterqc

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.

Data quality management

agat

Another Gff Analysis Toolkit (AGAT) Suite of tools to handle gene annotations in any GTF/GFF format.

Sequence annotation

alfred

BAM Statistics, Feature Counting and Annotation

Genetics

alien_hunter

Alien_hunter is an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order Motifs (IVOMs).

alignstats

AlignStats produces various alignment, whole genome coverage, and capture coverage metrics for sequence alignment files in SAM, BAM, and CRAM format. This program is designed to serve reporting and quality control purposes in sequencing analysis pipelines at the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC).

Sequence alignment

amptk

AMPtk is a series of scripts to process NGS amplicon data using USEARCH and VSEARCH, it can also be used to process any NGS amplicon data and includes databases setup for analysis of fungal ITS, fungal LSU, bacterial 16S, and insect COI amplicons. It can handle Ion Torrent, MiSeq, and 454 data.

Next-generation Sequencing

anchorwave

AnchorWave (Anchored Wavefront Alignment) identifies collinear regions via conserved anchors (full-length CDS and full-length exon have been implemented currently) and breaks collinear regions into shorter fragments, i.e., anchor and inter-anchor intervals. By performing sensitive sequence alignment for each shorter interval via a 2-piece affine gap cost strategy and merging them together, AnchorWave generates a whole-genome alignment for each collinear block. AnchorWave implements commands to guide collinear block identification with or without chromosomal rearrangements and provides options to use known polyploidy levels or whole-genome duplications to inform alignment.

Computational Biology

annogesic

ANNOgesic is the swiss army knife for RNA-Seq based annotation of bacterial/archaeal genomes. It is a modular, command-line tool that can integrate different types of RNA-Seq data based on dRNA-Seq (differential RNA-Seq) or RNA-Seq protocols that inclusde transcript fragmentation to generate high quality genome annotations. It can detect genes, CDSs/tRNAs/rRNAs, transcription starting sites (TSS) and processing sites, transcripts, terminators, untranslated regions (UTR) as well as small RNAs (sRNA), small open reading frames (sORF), circular RNAs, CRISPR related RNAs, riboswitches and RNA-thermometers. It can also perform RNA-RNA and protein-protein interactions prediction.

Genome annotation tools

Transcriptomics Analysis

Computational Genomics

ariba

Antimicrobial Resistance Identification By Assembly

Next-generation Sequencing

Microbial genomics

ascatngs

Somatic copy number analysis using WGS paired end wholegenome sequencing

Cancer Genomics

Computational Genomics

Molecular genetics

asgal

ASGAL (Alternative Splicing Graph ALigner) is a tool for detecting the alternative splicing events expressed in a RNA-Seq sample with respect to a gene annotation. The main idea behind ASGAL is the following one: the alternative splicing events can be detected by aligning the RNA-Seq reads against the splicing graph of the gene.

Transcriptomics

RNA-Seq data analysis

Computational Genomics

assembly-stats

Get assembly statistics from FASTA and FASTQ files.

Genome assembly

atropos

Atropos is tool for specific, sensitive, and speedy trimming of NGS reads.

Sequencing

Sequence assembly

bcl2fastq

bcl2fastq is a Linux-based command-line tool from Illumina that converts raw base call (BCL) files from Illumina sequencers into FASTQ format, while simultaneously demultiplexing data based on sample indexes. It is crucial for analyzing sequencing data, requiring a sample sheet and producing FASTQ files, statistics, and reports.

NGS data preprocessing

Sequence data conversion

bioawk

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.