User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: March 20, 2023, 7 p.m.

Thank you danielecook & contributors
View Topic on GitHub:
danielecook/Awesome-Bioinformatics

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Package suites

Core BioPerl 1.x code

267
177
10m
n/a

Official git repository for Biopython (originally converted from CVS)

2.89K
1.39K
1y 4m
n/a

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

991
141
1y 5m
MIT

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

226
58
1y 4m
n/a

A Go package for engineering organisms.

160
23
1y 4m
MIT

OCaml Bioinformatics Library

107
18
1y 8m
n/a

Downloading

The command-line interface to GGD

27
2
1y 7m
MIT

Web application to explore the Sequence Read Archive.

119
24
1y 8m
GPL-2.0

Compressing

Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too

68
4
1y 72d
n/a

Command Line Utilities

Useful bash one-liners for bioinformatics.

1.31K
411
3y 9m
n/a

Modular and universal bioinformatics

294
43
3y 4m
MIT

Syntax highlighting for computational biology

190
28
1y 7m
GPL-3.0

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

4.87K
565
1y 5m
MIT

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

649
65
1y 5m
MIT

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

21
6
3y 9m
MIT

a wee tool for random access into BGZF files.

78
12
4y 10m
MIT

sort genomic data

30
1
2y 9m
MIT

Note: tabix and bgzip binaries are now part of the HTSlib project.

82
39
1y 7m
n/a

Write-once-read-many table for large datasets.

25
4
6y 4m
LGPL-3.0

Create an index on a compressed text file

553
36
4y 116d
BSD-2-Clause

Workflow Managers

BigDataScript: Scirpting language for big data

90
22
1y 11m
n/a

Bpipe - a tool for running and managing bioinformatics pipelines

195
53
1y 5m
n/a

Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊

1.29K
187
1y 6m
Apache-2.0

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

742
259
1y 4m
n/a

A DSL for data-driven computational pipelines

1.53K
373
1y 4m
Apache-2.0

Yet another redundant workflow engine

357
22
5m
Apache-2.0

CGAT-ruffus is a lightweight python module for running computational pipelines

163
34
1y 8m
MIT

Robust, flexible and resource-efficient pipelines using Go and the commandline

925
67
9m
MIT

This is the SeqWare Project's main repo.

26
18
4y 8m
GPL-3.0

Workflow Description Language - Specification and Implementations

24
6
4y 3d
BSD-3-Clause

Pipelines

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

4.38K
482
1y 5m
n/a

A flexible pipeline for complete analysis of bacterial genomes

105
23
1y 4m
MIT

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

39
3
1y 4m
GPL-3.0

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

860
347
1y 37d
MIT

Software for intuitively doing Differential Gene Expression (DGE) analysis on Windows and GNU\Linux, based on R packages.

3
0
3y 5m
n/a

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

8
3
1y 5m
GPL-3.0

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

179
51
2y 10m
MIT

A quality control analysis tool for high throughput sequencing data

197
51
1y 7m
n/a

Simple FASTQ quality assessment using Python

94
15
1y 10m
MIT

FASTA/FASTQ pre-processing programs

130
56
4y 81d
n/a

Aggregate results from bioinformatics analyses across many samples into a single report.

772
426
1y 4m
GPL-3.0

seqfu - Sequece Fastx Utilities

23
0
1y 4m
GPL-3.0

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

750
112
1y 4m
MIT

An imagemagick-like frontend to Biopython SeqIO

93
21
2y 96d
GPL-3.0

Toolkit for processing sequences in FASTA/Q formats

881
255
1y 5m
MIT

Explore and analyze biological sequence data

11
1
1y 8m
MIT

Data Analysis

Scalable genomic data analysis.

756
205
1y 4m
MIT

Scalable gVCF merging and joint variant calling for population sequencing projects

76
23
1y 6m
Apache-2.0

Pairwise

A fast and sensitive gapped read aligner

402
135
1y 4m
GPL-3.0

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

1.04K
495
1y 5m
GPL-3.0

Wavefront alignment algorithm (WFA): Fast and exact gap-affine pairwise alignment

167
14
1y 8m
n/a

Pairwise Sequence Alignment Library

160
22
1y 11m
n/a

Mummer alignment tool

274
87
1y 6m
n/a

Accelerated BLAST compatible local sequence aligner.

701
148
10m
GPL-3.0

Multiple Sequence Alignment

A simple Partial Order Aligner based on Lee, Grasso and Sharlow (2002), for education/demonstration purposes

52
11
1y 6m
GPL-2.0

Clustering

MMseqs2: ultra fast and sensitive search and clustering suite

577
87
1y 4m
GPL-3.0

Quantification

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data

290
99
1y 7m
GPL-3.0

Variant Calling

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

2.37K
587
1y 4m
BSD-3-Clause

Bayesian haplotype-based genetic polymorphism discovery and genotyping.

560
232
2y 6d
MIT

Official code repository for GATK versions 1.0 through 3.7 (core engine). For GATK 4 code, see the https://github.com/broadinstitute/gatk repository

277
221
4y 7m
n/a

Bayesian haplotype-based mutation calling

238
31
1y 4m
MIT

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

411
186
1y 4m
n/a

Structural variant callers

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

277
112
1y 4m
BSD-3-Clause

lumpy: a general probabilistic framework for structural variant discovery

234
109
2y 6m
MIT

Structural variant and indel caller for mapped sequencing data

293
106
3y 8m
n/a

GRIDSS: the Genomic Rearrangement IDentification Software Suite

165
42
1y 4m
n/a

structural variant calling and genotyping with existing tools, but, smoothly.

156
18
1y 5m
Apache-2.0

BAM File Utilities

C++ API & command-line toolkit for working with BAM data

346
142
1y 7m
MIT

A bam toolbox

1
1
5y 5m
MIT

Automate common sam & bam conversions

6
1
9y 11m
n/a

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

444
76
1y 8m
MIT

SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.

11
1
5y 101d
GPL-3.0

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

156
22
1y 69d
MIT

A software for calculating telomere length

41
24
4y 4m
GPL-3.0

VCF File Utilities

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

411
186
1y 4m
n/a

annotate a VCF with other VCFs/BEDs/tabixed files

274
52
1y 4m
MIT

C++ library and cmdline tools for parsing and manipulating VCF files

448
202
1y 9m
MIT

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.

354
133
2y 4m
LGPL-3.0

GFF BED File Utilities

Another Gtf/Gff Analysis Toolkit

174
25
1y 4m
GPL-3.0

GFF and GTF file manipulation and interconversion

173
63
1y 5m
MIT

bedtools - the swiss army knife for genome arithmetic

725
265
1y 4m
MIT

Variant Simulation

tools for adding mutations to existing .bam files, used for testing mutation callers

178
69
1y 8m
MIT

Reads simulator

194
87
1y 6m
n/a

Variant Prediction/Annotation

Data

python access to UCSC genomes database

126
39
2y 6m
MIT

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl

266
48
1y 5m
Apache-2.0

Access to Biological Web Services from Python.

189
58
1y 5m
n/a

Tools

A fast Python library for VCF files leveraging Cython for speed.

51
13
4y 12m
MIT

cython + htslib == fast VCF and BCF processing

275
49
1y 4m
MIT

Python wrapper -- and more -- for Aaron Quinlan's BEDTools (bioinformatics tools)

235
85
1y 5m
n/a

Efficient pythonic random access to fasta subsequences

324
55
1y 4m
n/a

Pysam is a Python module for reading and manipulating SAM/BAM/VCF/BCF files. It's a lightweight wrapper of the htslib C-API, the same one that powers samtools, bcftools, and tabix.

539
227
1y 4m
MIT

A Variant Call Format reader for Python.

353
186
1y 6m
n/a

Assembly

SPAdes Genome Assembler

380
94
1y 4m
n/a

SKESA assembler

69
12
1y 5m
n/a

Annotation

Rapid prokaryotic genome annotation

511
176
1y 9m
n/a

Rapid & standardized annotation of bacterial genomes & plasmids

106
12
1y 4m
GPL-3.0

Long-read Assembly

A single molecule sequence assembler for genomes large and small.

494
161
1y 4m
n/a

De novo assembler for single molecule sequencing reads using repeat graphs

447
97
1y 5m
n/a

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

202
33
1y 6m
MIT

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly

432
83
2y 43d
GPL-3.0

Genome Browsers / Gene Diagrams

📈 DNA Sequence Visualization for Humans

31
8
1y 8m
MIT

Interactive web-based genome browser.

210
68
3y 6m
BSD-2-Clause

🔬A library of JavaScript components to represent biological data

443
122
1y 5m
Apache-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

38
6
3y 9m
Artistic-2.0

Horizon chart js library for DNA data.

58
6
6y 10m
n/a

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations

452
214
1y 4m
MIT

SVG based genome viewer written in javascript using D3

32
25
7y 8m
GPL-2.0

A modern genome browser built with JavaScript and HTML5.

409
197
1y 5m
LGPL-2.1

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

16
2
1y 5m
GPL-3.0

Interactive in-browser track viewer

257
63
1y 4m
Apache-2.0

HTML5 canvas genomic graphics library

74
17
3y 10m
n/a

Circos Related

Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.

54
30
4y 4m
n/a

Fuji plot—a circos representation of multiple GWAS results—

47
15
2y 65d
GPL-3.0

Database Access

Becoming a Bioinformatician

Bioinformatics on GitHub

Alternative splicing resource

25
6
4y 11m
n/a

A collection of research papers for AI-based protein design

86
4
4m
Apache-2.0

Sequencing

[1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.

RNA-Seq

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

1.11K
569
1y 11m
CC-BY-SA-4.0

RNAseq analysis notes from Ming Tang

614
238
1y 9m
MIT

[46:39] - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and Cufflinks and explains some of their methodologies.

ChIP-Seq

ChIP-seq analysis notes from Ming Tang

526
257
1y 10m
MIT

YouTube Channels and Playlists

Blogs

Miscellaneous

Online networking groups