User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Dec. 2, 2020, 9:04 p.m.

Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Package suites

Official git repository for Biopython (originally converted from CVS)

2.41K
1.18K
91d
n/a

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

745
118
58d
MIT

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

163
49
57d
n/a

Data Tools

The command-line interface to GGD

11
2
94d
MIT

Web application to explore the Sequence Read Archive.

81
14
102d
GPL-2.0

Command Line Utilities

Useful bash one-liners for bioinformatics.

1.14K
364
1y 5m
n/a

Modular and universal bioinformatics

290
43
1y 15d
MIT

Syntax highlighting for computational biology

158
24
9m
GPL-3.0

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

4.41K
542
34d
MIT

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

506
51
105d
MIT

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

19
7
1y 5m
MIT

a wee tool for random access into BGZF files.

78
10
2y 6m
MIT

sort genomic data

27
2
6m
MIT

Note: tabix and bgzip binaries are now part of the HTSlib project.

74
36
4m
n/a

Write-once-read-many table for large datasets.

23
4
4y 29d
LGPL-3.0

Create an index on a compressed text file

524
32
2y 8d
BSD-2-Clause

Data transformations and statistics. [ web ]

are some example scripts using GNU Parallel. [ web ]

Workflow Managers

BigDataScript: Scirpting language for big data

85
21
108d
Apache-2.0

Bpipe - a tool for running and managing bioinformatics pipelines

177
47
102d
n/a

Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊

1.16K
173
96d
Apache-2.0

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

616
224
9d
n/a

A DSL for data-driven computational pipelines

1.19K
271
92d
Apache-2.0

CGAT-ruffus is a lightweight python module for running computational pipelines

150
31
4m
MIT

This is the SeqWare Project's main repo.

27
18
2y 4m
GPL-3.0

Workflow Description Language - Specification and Implementations

20
7
1y 8m
BSD-3-Clause

A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]

Pipelines

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

3.37K
394
92d
n/a

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

790
331
6d
MIT

Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.

3
0
1y 50d
n/a

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

166
53
6m
MIT

A quality control analysis tool for high throughput sequencing data

127
35
92d
n/a

Simple FASTQ quality assessment using Python

90
13
2y 6m
MIT

FASTA/FASTQ pre-processing programs

113
51
1y 11m
n/a

Aggregate results from bioinformatics analyses across many samples into a single report.

643
349
91d
GPL-3.0

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

527
81
117d
MIT

An imagemagick-like frontend to Biopython SeqIO

86
18
100d
GPL-3.0

Toolkit for processing sequences in FASTA/Q formats

721
222
5m
MIT

Explore and analyze biological sequence data

9
1
91d
MIT

Data Analysis

Scalable genomic data analysis.

684
184
8d
MIT

Scalable gVCF merging and joint variant calling for population sequencing projects

53
21
112d
Apache-2.0

Pairwise

A fast and sensitive gapped read aligner

317
115
93d
GPL-3.0

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

889
440
106d
GPL-3.0

Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment

129
8
69d
n/a

Pairwise Sequence Alignment Library

127
19
7m
n/a

Mummer alignment tool

218
77
57d
n/a

Multiple Sequence Alignment

A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes

44
8
2y 90d
GPL-2.0

Quantification

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data

237
86
9m
GPL-3.0

Variant Calling

Bayesian haplotype-based genetic polymorphism discovery and genotyping.

486
212
106d
MIT

Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository

275
210
2y 104d
n/a

Tools (written in C using htslib) for manipulating next-generation sequencing data

973
450
93d
n/a

Structural variant callers

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

213
89
97d
BSD-3-Clause

lumpy: a general probabilistic framework for structural variant discovery

204
97
1y 7m
MIT

Structural variant and indel caller for mapped sequencing data

238
71
1y 4m
n/a

GRIDSS: the Genomic Rearrangement IDentification Software Suite

112
27
91d
n/a

structural variant calling and genotyping with existing tools, but, smoothly.

128
11
1y 23d
Apache-2.0

BAM File Utilities

C++ API & command-line toolkit for working with BAM data

316
131
6m
MIT

A bam toolbox

1
1
3y 48d
MIT

Automate common sam & bam conversions

6
1
7y 7m
n/a

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

332
58
4m
MIT

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

8
0
2y 11m
GPL-3.0

A software for calculating telomere length

31
20
2y 41d
GPL-3.0

VCF File Utilities

This is the official development repository for BCFtools. To compile, the develop branch of htslib is needed: git clone --branch=develop git://github.com/samtools/htslib.git htslib

323
165
91d
n/a

annotate a VCF with other VCFs/BEDs/tabixed files

241
42
7m
MIT

C++ library and cmdline tools for parsing and manipulating VCF files

389
182
4m
MIT

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.

294
120
8m
LGPL-3.0

GFF BED File Utilities

GFF and GTF file manipulation and interconversion

149
59
4m
MIT

bedtools - the swiss army knife for genome arithmetic

625
236
106d
MIT

The fast, highly scalable and easily-parallelizable genome analysis toolkit. [ paper-2012 ]

Variant Simulation

tools for adding mutations to existing .bam files, used for testing mutation callers

155
60
6m
MIT

Reads simulator

165
79
1y 63d
n/a

Variant Prediction/Annotation

SIFT

327
58
9m
MIT
96
49
106d
n/a

Data

python access to UCSC genomes database

113
41
98d
MIT

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl

231
44
97d
Apache-2.0

Access to Biological Web Services from Python.

154
44
99d
GPL-3.0

Tools

A fast Python library for VCF files leveraging Cython for speed.

51
13
2y 8m
MIT

cython + htslib == fast VCF and BCF processing

214
43
94d
MIT

Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)

206
79
4m
n/a

Efficient pythonic random access to fasta subsequences

278
49
119d
n/a

Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.

454
194
5m
MIT

A Variant Call Format reader for Python.

323
165
111d
n/a

Genome Browsers / Gene Diagrams

📈 DNA Sequence Visualization for Humans

28
5
6m
MIT

Interactive web-based genome browser.

208
69
1y 98d
BSD-2-Clause

🔬A library of JavaScript components to represent biological data

431
119
11m
Apache-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

35
5
1y 5m
Artistic-2.0

Horizon chart js library for DNA data.

57
6
4y 7m
n/a

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

393
197
102d
MIT

SVG based genome viewer written in javascript using D3

29
24
5y 4m
GPL-2.0

A modern genome browser built with JavaScript and HTML5.

372
190
91d
LGPL-2.1

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

14
2
11m
GPL-3.0

Interactive in-browser track viewer

245
62
7m
Apache-2.0

HTML5 canvas genomic graphics library

73
17
1y 7m
n/a

Circos Related

Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.

49
25
2y 36d
n/a

Fuji plot—a circos representation of multiple GWAS results—

31
7
5m
GPL-3.0

web](http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html) ]

web](http://www.australianprostatecentre.org/research/software/jcircos) ]

Database Access

UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.

Becoming a Bioinformatician

Path to a free self-taught education in Bioinformatics!

1.55K
326
3d
n/a

Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'

A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found here.

A critical view of the state of bioinformatics.

Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."

Rosalind is a platform for learning bioinformatics through problem solving.

This guide is aimed at bioinformaticians, and is meant to guide them towards better career development.

Bioinformatics on GitHub

Alternative splicing resource

19
5
2y 8m
n/a

Sequencing

1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.

List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.

3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.

RNA-Seq

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

1K
523
5m
CC-BY-SA-4.0

RNAseq analysis notes from Ming Tang

530
214
24d
MIT

Includes lots of seminal papers on RNA-seq and analysis methods.

RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.

Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.

46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.

Extensive list on Wikipedia of RNA-seq bioinformatics tools needed in analysis, ranging from all parts of an analysis pipeline from quality control, alignment, splice analysis, and visualizations.

ChIP-Seq

ChIP-seq analysis notes from Ming Tang

459
229
107d
MIT

YouTube Channels and Playlists

Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.

GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."

Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.

Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."

Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.

NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.

Blogs

Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."

Dr. Mick Watson write on bioinformatics, genomes, and biology.

Dr. Lior Pachter writes review and commentary on computational biology.

Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"

Miscellaneous

The Leek group guide to genomics papers

363
158
2y 29d
n/a

This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."

An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.

Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics.

Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.

Online networking groups

a Discord server for general bioinformatics

A community of bioinformaticians based in Granada, Spain

A community of bioinformaticians centered in Latin America

An Austrialian group for bioinformatics students