User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: June 26, 2022, 10:11 p.m.

Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Package suites

Official git repository for Biopython (originally converted from CVS)

2.89K
1.39K
7m
n/a

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

991
141
8m
MIT

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

226
58
7m
n/a

A Go package for engineering organisms.

160
23
7m
MIT

OCaml Bioinformatics Library

107
18
11m
n/a

Downloading

The command-line interface to GGD

27
2
11m
MIT

Web application to explore the Sequence Read Archive.

119
24
12m
GPL-2.0

Compressing

Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too

68
4
5m
n/a

Command Line Utilities

Useful bash one-liners for bioinformatics.

1.31K
411
3y 18d
n/a

Modular and universal bioinformatics

294
43
2y 7m
MIT

Syntax highlighting for computational biology

190
28
11m
GPL-3.0

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

4.87K
565
8m
MIT

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

649
65
8m
MIT

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

21
6
3y 10d
MIT

a wee tool for random access into BGZF files.

78
12
4y 49d
MIT

sort genomic data

30
1
2y 32d
MIT

Note: tabix and bgzip binaries are now part of the HTSlib project.

82
39
10m
n/a

Write-once-read-many table for large datasets.

25
4
5y 7m
LGPL-3.0

Create an index on a compressed text file

553
36
3y 7m
BSD-2-Clause

Workflow Managers

BigDataScript: Scirpting language for big data

90
22
1y 88d
n/a

Bpipe - a tool for running and managing bioinformatics pipelines

195
53
8m
n/a

Repository for the CWL standards. Use https://cwl.discourse.group/ for support ๐Ÿ˜Š

1.29K
187
9m
Apache-2.0

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

742
259
7m
n/a

A DSL for data-driven computational pipelines

1.53K
373
7m
Apache-2.0

CGAT-ruffus is a lightweight python module for running computational pipelines

163
34
11m
MIT

Robust, flexible and resource-efficient pipelines using Go and the commandline

925
67
31d
MIT

This is the SeqWare Project's main repo.

26
18
3y 11m
GPL-3.0

Workflow Description Language - Specification and Implementations

24
6
3y 101d
BSD-3-Clause

Pipelines

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

4.38K
482
8m
n/a

A flexible pipeline for complete analysis of bacterial genomes

105
23
7m
MIT

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

39
3
8m
GPL-3.0

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

860
347
4m
MIT

Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.

3
0
2y 8m
n/a

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

8
3
8m
GPL-3.0

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

179
51
2y 44d
MIT

A quality control analysis tool for high throughput sequencing data

197
51
10m
n/a

Simple FASTQ quality assessment using Python

94
15
1y 36d
MIT

FASTA/FASTQ pre-processing programs

130
56
3y 5m
n/a

Aggregate results from bioinformatics analyses across many samples into a single report.

772
426
7m
GPL-3.0

seqfu - Sequece Fastx Utilities

23
0
7m
GPL-3.0

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

750
112
7m
MIT

An imagemagick-like frontend to Biopython SeqIO

93
21
1y 6m
GPL-3.0

Toolkit for processing sequences in FASTA/Q formats

881
255
8m
MIT

Explore and analyze biological sequence data

11
1
11m
MIT

Data Analysis

Scalable genomic data analysis.

756
205
7m
MIT

Scalable gVCF merging and joint variant calling for population sequencing projects

76
23
9m
Apache-2.0

Pairwise

A fast and sensitive gapped read aligner

402
135
8m
GPL-3.0

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

1.04K
495
8m
GPL-3.0

Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment

167
14
11m
n/a

Pairwise Sequence Alignment Library

160
22
1y 75d
n/a

Mummer alignment tool

274
87
9m
n/a

Multiple Sequence Alignment

A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes

52
11
9m
GPL-2.0

Clustering

MMseqs2: ultra fast and sensitive search and clustering suite

577
87
7m
GPL-3.0

Quantification

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data

290
99
11m
GPL-3.0

Variant Calling

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

2.37K
587
8m
BSD-3-Clause

Bayesian haplotype-based genetic polymorphism discovery and genotyping.

560
232
1y 104d
MIT

Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository

277
221
3y 10m
n/a

Bayesian haplotype-based mutation calling

238
31
7m
MIT

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

411
186
7m
n/a

Structural variant callers

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

277
112
8m
BSD-3-Clause

lumpy: a general probabilistic framework for structural variant discovery

234
109
1y 9m
MIT

Structural variant and indel caller for mapped sequencing data

293
106
2y 11m
n/a

GRIDSS: the Genomic Rearrangement IDentification Software Suite

165
42
7m
n/a

structural variant calling and genotyping with existing tools, but, smoothly.

156
18
8m
Apache-2.0

BAM File Utilities

C++ API & command-line toolkit for working with BAM data

346
142
11m
MIT

A bam toolbox

1
1
4y 8m
MIT

Automate common sam & bam conversions

6
1
9y 65d
n/a

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

444
76
11m
MIT

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

11
1
4y 6m
GPL-3.0

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

156
22
5m
MIT

A software for calculating telomere length

41
24
3y 8m
GPL-3.0

VCF File Utilities

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

411
186
7m
n/a

annotate a VCF with other VCFs/BEDs/tabixed files

274
52
8m
MIT

C++ library and cmdline tools for parsing and manipulating VCF files

448
202
1y 31d
MIT

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.

354
133
1y 8m
LGPL-3.0

GFF BED File Utilities

Another Gtf/Gff Analysis Toolkit

174
25
7m
GPL-3.0

GFF and GTF file manipulation and interconversion

173
63
8m
MIT

bedtools - the swiss army knife for genome arithmetic

725
265
8m
MIT

Variant Simulation

tools for adding mutations to existing .bam files, used for testing mutation callers

178
69
11m
MIT

Reads simulator

194
87
9m
n/a

Variant Prediction/Annotation

SIFT

380
63
10m
MIT
135
60
10m
n/a

Data

python access to UCSC genomes database

126
39
1y 10m
MIT

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl

266
48
8m
Apache-2.0

Access to Biological Web Services from Python.

189
58
8m
n/a

Tools

A fast Python library for VCF files leveraging Cython for speed.

51
13
4y 93d
MIT

cython + htslib == fast VCF and BCF processing

275
49
8m
MIT

Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)

235
85
8m
n/a

Efficient pythonic random access to fasta subsequences

324
55
8m
n/a

Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.

539
227
7m
MIT

A Variant Call Format reader for Python.

353
186
9m
n/a

Assembly

SPAdes Genome Assembler

380
94
8m
n/a

SKESA assembler

69
12
8m
n/a

Annotation

Rapid prokaryotic genome annotation

511
176
1y 31d
n/a

Rapid & standardized annotation of bacterial genomes & plasmids

106
12
7m
GPL-3.0

Long-read Assembly

A single molecule sequence assembler for genomes large and small.

494
161
8m
n/a

De novo assembler for single molecule sequencing reads using repeat graphs

447
97
8m
n/a

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

202
33
9m
MIT

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly

432
83
1y 4m
GPL-3.0

Genome Browsers / Gene Diagrams

๐Ÿ“ˆ DNA Sequence Visualization for Humans

31
8
11m
MIT

Interactive web-based genome browser.

210
68
2y 10m
BSD-2-Clause

๐Ÿ”ฌA library of JavaScript components to represent biological data

443
122
8m
Apache-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

38
6
3y 10d
Artistic-2.0

Horizon chart js library for DNA data.

58
6
6y 59d
n/a

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

452
214
7m
MIT

SVG based genome viewer written in javascript using D3

32
25
6y 11m
GPL-2.0

A modern genome browser built with JavaScript and HTML5.

409
197
8m
LGPL-2.1

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

16
2
8m
GPL-3.0

Interactive in-browser track viewer

257
63
8m
Apache-2.0

HTML5 canvas genomic graphics library

74
17
3y 53d
n/a

Circos Related

Circos is a software package for visualizing data and information. It visualizes data in a circular layout โ€” this makes Circos ideal for exploring relationships between objects or positions.

54
30
3y 8m
n/a

Fuji plotโ€”a circos representation of multiple GWAS resultsโ€”

47
15
1y 5m
GPL-3.0

Database Access

Becoming a Bioinformatician

Bioinformatics on GitHub

Alternative splicing resource

25
6
4y 89d
n/a

Sequencing

[1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.

RNA-Seq

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

1.11K
569
1y 82d
CC-BY-SA-4.0

RNAseq analysis notes from Ming Tang

614
238
1y 6d
MIT

[46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.

ChIP-Seq

ChIP-seq analysis notes from Ming Tang

526
257
1y 59d
MIT

YouTube Channels and Playlists

Blogs

Miscellaneous

Online networking groups