Your first time on this page? Allow me to give some explanations.
A curated list of awesome Bioinformatics libraries and software.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you danielecook & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
Official git repository for Biopython (originally converted from CVS)
This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.
The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
The command-line interface to GGD
Web application to explore the Sequence Read Archive.
Command Line Utilities
Useful bash one-liners for bioinformatics.
Modular and universal bioinformatics
Syntax highlighting for computational biology
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.
a wee tool for random access into BGZF files.
sort genomic data
Note: tabix and bgzip binaries are now part of the HTSlib project.
Write-once-read-many table for large datasets.
Create an index on a compressed text file
BigDataScript: Scirpting language for big data
Bpipe - a tool for running and managing bioinformatics pipelines
Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
A DSL for data-driven computational pipelines
CGAT-ruffus is a lightweight python module for running computational pipelines
This is the SeqWare Project's main repo.
Workflow Description Language - Specification and Implementations
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
A quality control analysis tool for high throughput sequencing data
Simple FASTQ quality assessment using Python
FASTA/FASTQ pre-processing programs
Aggregate results from bioinformatics analyses across many samples into a single report.
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
An imagemagick-like frontend to Biopython SeqIO
Toolkit for processing sequences in FASTA/Q formats
Explore and analyze biological sequence data
Scalable genomic data analysis.
Scalable gVCF merging and joint variant calling for population sequencing projects
A fast and sensitive gapped read aligner
Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment
Pairwise Sequence Alignment Library
Mummer alignment tool
Multiple Sequence Alignment
A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes
RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
Bayesian haplotype-based genetic polymorphism discovery and genotyping.
Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository
Tools (written in C using htslib) for manipulating next-generation sequencing data
Structural variant callers
DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
lumpy: a general probabilistic framework for structural variant discovery
Structural variant and indel caller for mapped sequencing data
GRIDSS: the Genomic Rearrangement IDentification Software Suite
structural variant calling and genotyping with existing tools, but, smoothly.
BAM File Utilities
C++ API & command-line toolkit for working with BAM data
A bam toolbox
Automate common sam & bam conversions
fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing
SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.
A software for calculating telomere length
VCF File Utilities
This is the official development repository for BCFtools. To compile, the develop branch of htslib is needed: git clone --branch=develop git://github.com/samtools/htslib.git htslib
annotate a VCF with other VCFs/BEDs/tabixed files
C++ library and cmdline tools for parsing and manipulating VCF files
A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.
GFF BED File Utilities
GFF and GTF file manipulation and interconversion
bedtools - the swiss army knife for genome arithmetic
tools for adding mutations to existing .bam files, used for testing mutation callers
python access to UCSC genomes database
Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Access to Biological Web Services from Python.
A fast Python library for VCF files leveraging Cython for speed.
cython + htslib == fast VCF and BCF processing
Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)
Efficient pythonic random access to fasta subsequences
Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.
A Variant Call Format reader for Python.
Genome Browsers / Gene Diagrams
📈 DNA Sequence Visualization for Humans
Interactive web-based genome browser.
Flexible circular visualization of genome-associated data with BioPerl and SVG.
Horizon chart js library for DNA data.
Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Interactive in-browser track viewer
HTML5 canvas genomic graphics library
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.
Fuji plot—a circos representation of multiple GWAS results—
UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.
Becoming a Bioinformatician
Path to a free self-taught education in Bioinformatics!
Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'
A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found here.
A critical view of the state of bioinformatics.
Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."
Rosalind is a platform for learning bioinformatics through problem solving.
This guide is aimed at bioinformaticians, and is meant to guide them towards better career development.
Bioinformatics on GitHub
Alternative splicing resource
1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.
3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.
Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.
RNAseq analysis notes from Ming Tang
Includes lots of seminal papers on RNA-seq and analysis methods.
RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.
Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.
46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.
ChIP-seq analysis notes from Ming Tang
YouTube Channels and Playlists
Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.
GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.
Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."
Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.
NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.
Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."
Dr. Mick Watson write on bioinformatics, genomes, and biology.
Dr. Lior Pachter writes review and commentary on computational biology.
The Leek group guide to genomics papers
This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."
An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.
Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics.
Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.
Online networking groups
a Discord server for general bioinformatics
the official Slack workspace of r/bioinformatics (send a direct message to apfejes on reddit)
A community of bioinformaticians based in Granada, Spain
A community of bioinformaticians centered in Latin America