User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: None

Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Package suites

Official git repository for Biopython (originally converted from CVS)

2.41K
1.18K
1y 50d
n/a

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

745
118
1y 17d
MIT

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

163
49
1y 16d
n/a

A Go package for engineering organisms.

99
18
4m
MIT

OCaml Bioinformatics Library

98
18
9m
n/a

Data Tools

The command-line interface to GGD

11
2
1y 53d
MIT

Web application to explore the Sequence Read Archive.

81
14
1y 61d
GPL-2.0

Command Line Utilities

Useful bash one-liners for bioinformatics.

1.14K
364
2y 4m
n/a

Modular and universal bioinformatics

290
43
1y 11m
MIT

Syntax highlighting for computational biology

158
24
1y 8m
GPL-3.0

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

4.48K
540
11m
MIT

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

506
51
1y 64d
MIT

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

19
7
2y 4m
MIT

a wee tool for random access into BGZF files.

78
10
3y 5m
MIT

sort genomic data

27
2
1y 5m
MIT

Note: tabix and bgzip binaries are now part of the HTSlib project.

74
36
1y 88d
n/a

Write-once-read-many table for large datasets.

23
4
4y 11m
LGPL-3.0

Create an index on a compressed text file

524
32
2y 11m
BSD-2-Clause

Data transformations and statistics. [ web ]

are some example scripts using GNU Parallel. [ web ]

Workflow Managers

BigDataScript: Scirpting language for big data

85
21
1y 67d
Apache-2.0

Bpipe - a tool for running and managing bioinformatics pipelines

177
47
1y 61d
n/a

Repository for the CWL standards. Use https://cwl.discourse.group/ for support ๐Ÿ˜Š

1.16K
173
1y 55d
Apache-2.0

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

645
232
8m
n/a

A DSL for data-driven computational pipelines

1.19K
271
1y 51d
Apache-2.0

CGAT-ruffus is a lightweight python module for running computational pipelines

150
31
1y 84d
MIT

This is the SeqWare Project's main repo.

27
18
3y 107d
GPL-3.0

Workflow Description Language - Specification and Implementations

20
7
2y 7m
BSD-3-Clause

A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]

Pipelines

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

3.37K
394
1y 51d
n/a

A flexible pipeline for complete analysis of bacterial genomes

97
21
4m
MIT

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

35
3
26d
GPL-3.0

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

799
334
8m
MIT

Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.

3
0
2y 9d
n/a

A pipeline for preprocessing NGS data

7
3
5m
GPL-3.0

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

166
53
1y 5m
MIT

A quality control analysis tool for high throughput sequencing data

127
35
1y 51d
n/a

Simple FASTQ quality assessment using Python

90
13
3y 4m
MIT

FASTA/FASTQ pre-processing programs

113
51
2y 9m
n/a

Aggregate results from bioinformatics analyses across many samples into a single report.

643
349
1y 50d
GPL-3.0

seqfu - Sequece Fastx Utilities

18
0
4m
GPL-3.0

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

527
81
1y 76d
MIT

An imagemagick-like frontend to Biopython SeqIO

86
18
1y 59d
GPL-3.0

Toolkit for processing sequences in FASTA/Q formats

721
222
1y 4m
MIT

Explore and analyze biological sequence data

9
1
1y 50d
MIT

Data Analysis

Scalable genomic data analysis.

700
188
8m
MIT

Scalable gVCF merging and joint variant calling for population sequencing projects

53
21
1y 71d
Apache-2.0

Pairwise

A fast and sensitive gapped read aligner

317
115
1y 52d
GPL-3.0

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

889
440
1y 65d
GPL-3.0

Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment

129
8
1y 28d
n/a

Pairwise Sequence Alignment Library

127
19
1y 6m
n/a

Mummer alignment tool

218
77
1y 16d
n/a

Multiple Sequence Alignment

A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes

44
8
3y 49d
GPL-2.0

Clustering

MMseqs2: ultra fast and sensitive search and clustering suite

497
75
4m
GPL-3.0

Quantification

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data

237
86
1y 8m
GPL-3.0

Variant Calling

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

2.29K
566
4m
BSD-3-Clause

Bayesian haplotype-based genetic polymorphism discovery and genotyping.

486
212
1y 65d
MIT

Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository

275
210
3y 63d
n/a

Bayesian haplotype-based mutation calling

213
27
4m
MIT

This is the official development repository for BCFtools. To compile, the develop branch of htslib is needed: git clone --branch=develop git://github.com/samtools/htslib.git htslib

323
165
1y 50d
n/a

Structural variant callers

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

213
89
1y 56d
BSD-3-Clause

lumpy: a general probabilistic framework for structural variant discovery

204
97
2y 6m
MIT

Structural variant and indel caller for mapped sequencing data

238
71
2y 99d
n/a

GRIDSS: the Genomic Rearrangement IDentification Software Suite

112
27
1y 50d
n/a

structural variant calling and genotyping with existing tools, but, smoothly.

128
11
1y 11m
Apache-2.0

BAM File Utilities

C++ API & command-line toolkit for working with BAM data

316
131
1y 4m
MIT

A bam toolbox

1
1
4y 7d
MIT

Automate common sam & bam conversions

6
1
8y 6m
n/a

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

332
58
1y 91d
MIT

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

8
0
3y 10m
GPL-3.0

A software for calculating telomere length

31
20
3y 0d
GPL-3.0

VCF File Utilities

This is the official development repository for BCFtools. To compile, the develop branch of htslib is needed: git clone --branch=develop git://github.com/samtools/htslib.git htslib

323
165
1y 50d
n/a

annotate a VCF with other VCFs/BEDs/tabixed files

241
42
1y 6m
MIT

C++ library and cmdline tools for parsing and manipulating VCF files

389
182
1y 86d
MIT

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.

294
120
1y 7m
LGPL-3.0

GFF BED File Utilities

Another Gtf/Gff Analysis Toolkit

158
23
29d
GPL-3.0

GFF and GTF file manipulation and interconversion

149
59
1y 87d
MIT

bedtools - the swiss army knife for genome arithmetic

625
236
1y 65d
MIT

The fast, highly scalable and easily-parallelizable genome analysis toolkit. [ paper-2012 ]

Variant Simulation

tools for adding mutations to existing .bam files, used for testing mutation callers

155
60
1y 4m
MIT

Reads simulator

165
79
2y 22d
n/a

Variant Prediction/Annotation

SIFT

327
58
1y 8m
MIT
96
49
1y 65d
n/a

Data

python access to UCSC genomes database

113
41
1y 57d
MIT

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl

231
44
1y 56d
Apache-2.0

Access to Biological Web Services from Python.

154
44
1y 58d
GPL-3.0

Tools

A fast Python library for VCF files leveraging Cython for speed.

51
13
3y 7m
MIT

cython + htslib == fast VCF and BCF processing

214
43
1y 53d
MIT

Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)

206
79
1y 107d
n/a

Efficient pythonic random access to fasta subsequences

278
49
1y 78d
n/a

Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.

454
194
1y 115d
MIT

A Variant Call Format reader for Python.

323
165
1y 70d
n/a

Assembly

SPAdes Genome Assembler

358
93
92d
n/a

SKESA assembler

68
11
93d
n/a

Annotation

Rapid prokaryotic genome annotation

505
169
4m
n/a

Rapid & standardized annotation of bacterial genomes & plasmids

99
11
29d
GPL-3.0

Long-read Assembly

A single molecule sequence assembler for genomes large and small.

484
160
37d
n/a

De novo assembler for single molecule sequencing reads using repeat graphs

437
91
32d
n/a

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

189
31
41d
MIT

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly

431
83
8m
GPL-3.0

Genome Browsers / Gene Diagrams

๐Ÿ“ˆ DNA Sequence Visualization for Humans

28
5
1y 5m
MIT

Interactive web-based genome browser.

208
69
2y 57d
BSD-2-Clause

๐Ÿ”ฌA library of JavaScript components to represent biological data

431
119
1y 9m
Apache-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

35
5
2y 4m
Artistic-2.0

Horizon chart js library for DNA data.

57
6
5y 5m
n/a

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

393
197
1y 61d
MIT

SVG based genome viewer written in javascript using D3

29
24
6y 105d
GPL-2.0

A modern genome browser built with JavaScript and HTML5.

372
190
1y 50d
LGPL-2.1

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

14
2
1y 10m
GPL-3.0

Interactive in-browser track viewer

245
62
1y 5m
Apache-2.0

HTML5 canvas genomic graphics library

73
17
2y 5m
n/a

Circos Related

Circos is a software package for visualizing data and information. It visualizes data in a circular layout โ€” this makes Circos ideal for exploring relationships between objects or positions.

49
25
2y 12m
n/a

Fuji plotโ€”a circos representation of multiple GWAS resultsโ€”

31
7
1y 112d
GPL-3.0

web](http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html) ]

web](http://www.australianprostatecentre.org/research/software/jcircos) ]

Database Access

UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.

Becoming a Bioinformatician

Path to a free self-taught education in Bioinformatics!

1.58K
330
10m
n/a

Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'

A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found here.

Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."

Rosalind is a platform for learning bioinformatics through problem solving.

This guide is aimed at bioinformaticians, and is meant to guide them towards better career development.

Bioinformatics on GitHub

Alternative splicing resource

19
5
3y 6m
n/a

Sequencing

1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.

List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.

3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.

RNA-Seq

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

1K
523
1y 4m
CC-BY-SA-4.0

RNAseq analysis notes from Ming Tang

531
214
11m
MIT

Includes lots of seminal papers on RNA-seq and analysis methods.

RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.

Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.

46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.

Extensive list on Wikipedia of RNA-seq bioinformatics tools needed in analysis, ranging from all parts of an analysis pipeline from quality control, alignment, splice analysis, and visualizations.

ChIP-Seq

ChIP-seq analysis notes from Ming Tang

464
231
1y 66d
MIT

YouTube Channels and Playlists

Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.

GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."

Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.

Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."

Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.

NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.

Blogs

Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."

Dr. Mick Watson write on bioinformatics, genomes, and biology.

Dr. Lior Pachter writes review and commentary on computational biology.

Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"

Miscellaneous

The Leek group guide to genomics papers

363
158
2y 11m
n/a

This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."

An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.

Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics.

Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.

Online networking groups

a Discord server for general bioinformatics

A community of bioinformaticians based in Granada, Spain

An Austrialian group for bioinformatics students