User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: None

Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Package suites

Official git repository for Biopython (originally converted from CVS)

2.41K
1.18K
8m
n/a

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

745
118
7m
MIT

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

163
49
7m
n/a

Data Tools

The command-line interface to GGD

11
2
8m
MIT

Web application to explore the Sequence Read Archive.

81
14
8m
GPL-2.0

Command Line Utilities

Useful bash one-liners for bioinformatics.

1.14K
364
1y 11m
n/a

Modular and universal bioinformatics

290
43
1y 5m
MIT

Syntax highlighting for computational biology

158
24
1y 92d
GPL-3.0

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

4.48K
540
6m
MIT

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

506
51
8m
MIT

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

19
7
1y 10m
MIT

a wee tool for random access into BGZF files.

78
10
3y 3d
MIT

sort genomic data

27
2
11m
MIT

Note: tabix and bgzip binaries are now part of the HTSlib project.

74
36
9m
n/a

Write-once-read-many table for large datasets.

23
4
4y 6m
LGPL-3.0

Create an index on a compressed text file

524
32
2y 5m
BSD-2-Clause

Data transformations and statistics. [ web ]

are some example scripts using GNU Parallel. [ web ]

Workflow Managers

BigDataScript: Scirpting language for big data

85
21
8m
Apache-2.0

Bpipe - a tool for running and managing bioinformatics pipelines

177
47
8m
n/a

Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊

1.16K
173
8m
Apache-2.0

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

645
232
82d
n/a

A DSL for data-driven computational pipelines

1.19K
271
8m
Apache-2.0

CGAT-ruffus is a lightweight python module for running computational pipelines

150
31
9m
MIT

This is the SeqWare Project's main repo.

27
18
2y 10m
GPL-3.0

Workflow Description Language - Specification and Implementations

20
7
2y 55d
BSD-3-Clause

A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]

Pipelines

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

3.37K
394
8m
n/a

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

799
334
78d
MIT

Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.

3
0
1y 7m
n/a

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

166
53
12m
MIT

A quality control analysis tool for high throughput sequencing data

127
35
8m
n/a

Simple FASTQ quality assessment using Python

90
13
2y 11m
MIT

FASTA/FASTQ pre-processing programs

113
51
2y 4m
n/a

Aggregate results from bioinformatics analyses across many samples into a single report.

643
349
8m
GPL-3.0

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

527
81
9m
MIT

An imagemagick-like frontend to Biopython SeqIO

86
18
8m
GPL-3.0

Toolkit for processing sequences in FASTA/Q formats

721
222
11m
MIT

Explore and analyze biological sequence data

9
1
8m
MIT

Data Analysis

Scalable genomic data analysis.

700
188
82d
MIT

Scalable gVCF merging and joint variant calling for population sequencing projects

53
21
9m
Apache-2.0

Pairwise

A fast and sensitive gapped read aligner

317
115
8m
GPL-3.0

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

889
440
8m
GPL-3.0

Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment

129
8
7m
n/a

Pairwise Sequence Alignment Library

127
19
1y 22d
n/a

Mummer alignment tool

218
77
7m
n/a

Multiple Sequence Alignment

A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes

44
8
2y 8m
GPL-2.0

Quantification

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data

237
86
1y 88d
GPL-3.0

Variant Calling

Bayesian haplotype-based genetic polymorphism discovery and genotyping.

486
212
8m
MIT

Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository

275
210
2y 8m
n/a

Tools (written in C using htslib) for manipulating next-generation sequencing data

973
450
8m
n/a

Structural variant callers

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

213
89
8m
BSD-3-Clause

lumpy: a general probabilistic framework for structural variant discovery

204
97
2y 27d
MIT

Structural variant and indel caller for mapped sequencing data

238
71
1y 10m
n/a

GRIDSS: the Genomic Rearrangement IDentification Software Suite

112
27
8m
n/a

structural variant calling and genotyping with existing tools, but, smoothly.

128
11
1y 6m
Apache-2.0

BAM File Utilities

C++ API & command-line toolkit for working with BAM data

316
131
11m
MIT

A bam toolbox

1
1
3y 6m
MIT

Automate common sam & bam conversions

6
1
8y 19d
n/a

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

332
58
9m
MIT

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

8
0
3y 5m
GPL-3.0

A software for calculating telomere length

31
20
2y 6m
GPL-3.0

VCF File Utilities

This is the official development repository for BCFtools. To compile, the develop branch of htslib is needed: git clone --branch=develop git://github.com/samtools/htslib.git htslib

323
165
8m
n/a

annotate a VCF with other VCFs/BEDs/tabixed files

241
42
1y 27d
MIT

C++ library and cmdline tools for parsing and manipulating VCF files

389
182
9m
MIT

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.

294
120
1y 63d
LGPL-3.0

GFF BED File Utilities

GFF and GTF file manipulation and interconversion

149
59
9m
MIT

bedtools - the swiss army knife for genome arithmetic

625
236
8m
MIT

The fast, highly scalable and easily-parallelizable genome analysis toolkit. [ paper-2012 ]

Variant Simulation

tools for adding mutations to existing .bam files, used for testing mutation callers

155
60
11m
MIT

Reads simulator

165
79
1y 7m
n/a

Variant Prediction/Annotation

SIFT

327
58
1y 91d
MIT
96
49
8m
n/a

Data

python access to UCSC genomes database

113
41
8m
MIT

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl

231
44
8m
Apache-2.0

Access to Biological Web Services from Python.

154
44
8m
GPL-3.0

Tools

A fast Python library for VCF files leveraging Cython for speed.

51
13
3y 47d
MIT

cython + htslib == fast VCF and BCF processing

214
43
8m
MIT

Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)

206
79
10m
n/a

Efficient pythonic random access to fasta subsequences

278
49
9m
n/a

Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.

454
194
10m
MIT

A Variant Call Format reader for Python.

323
165
9m
n/a

Genome Browsers / Gene Diagrams

📈 DNA Sequence Visualization for Humans

28
5
1y 3d
MIT

Interactive web-based genome browser.

208
69
1y 8m
BSD-2-Clause

🔬A library of JavaScript components to represent biological data

431
119
1y 4m
Apache-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

35
5
1y 10m
Artistic-2.0

Horizon chart js library for DNA data.

57
6
5y 13d
n/a

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

393
197
8m
MIT

SVG based genome viewer written in javascript using D3

29
24
5y 10m
GPL-2.0

A modern genome browser built with JavaScript and HTML5.

372
190
8m
LGPL-2.1

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

14
2
1y 4m
GPL-3.0

Interactive in-browser track viewer

245
62
1y 12d
Apache-2.0

HTML5 canvas genomic graphics library

73
17
2y 7d
n/a

Circos Related

Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.

49
25
2y 6m
n/a

Fuji plot—a circos representation of multiple GWAS results—

31
7
10m
GPL-3.0

web](http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html) ]

web](http://www.australianprostatecentre.org/research/software/jcircos) ]

Database Access

UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.

Becoming a Bioinformatician

Path to a free self-taught education in Bioinformatics!

1.58K
330
5m
n/a

Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'

A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found here.

A critical view of the state of bioinformatics.

Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."

Rosalind is a platform for learning bioinformatics through problem solving.

This guide is aimed at bioinformaticians, and is meant to guide them towards better career development.

Bioinformatics on GitHub

Alternative splicing resource

19
5
3y 43d
n/a

Sequencing

1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.

List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.

3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.

RNA-Seq

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

1K
523
10m
CC-BY-SA-4.0

RNAseq analysis notes from Ming Tang

531
214
6m
MIT

Includes lots of seminal papers on RNA-seq and analysis methods.

RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.

Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.

46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.

Extensive list on Wikipedia of RNA-seq bioinformatics tools needed in analysis, ranging from all parts of an analysis pipeline from quality control, alignment, splice analysis, and visualizations.

ChIP-Seq

ChIP-seq analysis notes from Ming Tang

464
231
8m
MIT

YouTube Channels and Playlists

Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.

GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."

Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.

Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."

Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.

NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.

Blogs

Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."

Dr. Mick Watson write on bioinformatics, genomes, and biology.

Dr. Lior Pachter writes review and commentary on computational biology.

Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"

Miscellaneous

The Leek group guide to genomics papers

363
158
2y 6m
n/a

This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."

An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.

Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics.

Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.

Online networking groups

a Discord server for general bioinformatics

A community of bioinformaticians based in Granada, Spain

A community of bioinformaticians centered in Latin America

An Austrialian group for bioinformatics students