Your first time on this page? Allow me to give some explanations.
Awesome Bioinformatics
A curated list of awesome Bioinformatics libraries and software.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
Package suites
Core BioPerl 1.x code
Official git repository for Biopython (originally converted from CVS)
This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.
The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
A Go package for engineering organisms.
OCaml Bioinformatics Library
Downloading
The command-line interface to GGD
Web application to explore the Sequence Read Archive.
Compressing
Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too
Command Line Utilities
Useful bash one-liners for bioinformatics.
Modular and universal bioinformatics
Syntax highlighting for computational biology
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.
a wee tool for random access into BGZF files.
sort genomic data
Note: tabix and bgzip binaries are now part of the HTSlib project.
Write-once-read-many table for large datasets.
Create an index on a compressed text file
Workflow Managers
BigDataScript: Scirpting language for big data
Bpipe - a tool for running and managing bioinformatics pipelines
Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
A DSL for data-driven computational pipelines
Yet another redundant workflow engine
CGAT-ruffus is a lightweight python module for running computational pipelines
Robust, flexible and resource-efficient pipelines using Go and the commandline
This is the SeqWare Project's main repo.
Workflow Description Language - Specification and Implementations
Pipelines
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
A flexible pipeline for complete analysis of bacterial genomes
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Sequence Processing
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
A quality control analysis tool for high throughput sequencing data
Simple FASTQ quality assessment using Python
FASTA/FASTQ pre-processing programs
Aggregate results from bioinformatics analyses across many samples into a single report.
seqfu - Sequece Fastx Utilities
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
An imagemagick-like frontend to Biopython SeqIO
Toolkit for processing sequences in FASTA/Q formats
Explore and analyze biological sequence data
Data Analysis
Scalable genomic data analysis.
Scalable gVCF merging and joint variant calling for population sequencing projects
Pairwise
A fast and sensitive gapped read aligner
Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment
Pairwise Sequence Alignment Library
Mummer alignment tool
Accelerated BLAST compatible local sequence aligner.
Multiple Sequence Alignment
A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes
Clustering
MMseqs2: ultra fast and sensitive search and clustering suite
Quantification
RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
Variant Calling
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Bayesian haplotype-based genetic polymorphism discovery and genotyping.
Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository
Bayesian haplotype-based mutation calling
This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
Structural variant callers
DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
lumpy: a general probabilistic framework for structural variant discovery
Structural variant and indel caller for mapped sequencing data
GRIDSS: the Genomic Rearrangement IDentification Software Suite
structural variant calling and genotyping with existing tools, but, smoothly.
BAM File Utilities
C++ API & command-line toolkit for working with BAM data
A bam toolbox
Automate common sam & bam conversions
fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing
SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.
fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
A software for calculating telomere length
VCF File Utilities
This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
annotate a VCF with other VCFs/BEDs/tabixed files
C++ library and cmdline tools for parsing and manipulating VCF files
A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.
GFF BED File Utilities
Another Gtf/Gff Analysis Toolkit
GFF and GTF file manipulation and interconversion
bedtools - the swiss army knife for genome arithmetic
Variant Simulation
tools for adding mutations to existing .bam files, used for testing mutation callers
Reads simulator
Variant Prediction/Annotation
SIFT
Data
python access to UCSC genomes database
Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Access to Biological Web Services from Python.
Tools
A fast Python library for VCF files leveraging Cython for speed.
cython + htslib == fast VCF and BCF processing
Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)
Efficient pythonic random access to fasta subsequences
Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.
A Variant Call Format reader for Python.
Assembly
SPAdes Genome Assembler
SKESA assembler
Annotation
Rapid prokaryotic genome annotation
Rapid & standardized annotation of bacterial genomes & plasmids
Long-read Assembly
A single molecule sequence assembler for genomes large and small.
De novo assembler for single molecule sequencing reads using repeat graphs
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
Genome Browsers / Gene Diagrams
📈 DNA Sequence Visualization for Humans
Interactive web-based genome browser.
🔬A library of JavaScript components to represent biological data
Flexible circular visualization of genome-associated data with BioPerl and SVG.
Horizon chart js library for DNA data.
Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
SVG based genome viewer written in javascript using D3
A modern genome browser built with JavaScript and HTML5.
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Interactive in-browser track viewer
HTML5 canvas genomic graphics library
Circos Related
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.
Fuji plot—a circos representation of multiple GWAS results—
Database Access
Becoming a Bioinformatician
Path to a free self-taught education in Bioinformatics!
Bioinformatics on GitHub
Alternative splicing resource
A collection of research papers for AI-based protein design
Sequencing
[1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
RNA-Seq
Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.
RNAseq analysis notes from Ming Tang
[46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.
ChIP-Seq
ChIP-seq analysis notes from Ming Tang
YouTube Channels and Playlists
Blogs
Miscellaneous
The Leek group guide to genomics papers