User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Data Science

Probably the best curated list of data science software in Python.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Feb. 26, 2021, 9:04 a.m.

Thank you krzjoa & contributors
View Topic on GitHub:
krzjoa/awesome-python-data-science

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

General Purpouse Machine Learning

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

2.84K
510
12m
Apache-2.0

cuML - RAPIDS Machine Learning Library

1.95K
313
4d
Apache-2.0

A modular active learning framework for Python

1.1K
183
50d
MIT

PySpark + Scikit-learn = Sparkit-learn

1.06K
239
3y 4m
Apache-2.0

mlpack: a scalable C++ machine learning library --

3.56K
1.32K
5d
n/a

A toolkit for making real world machine learning and data analysis applications in C++

9.91K
2.88K
6d
BSL-1.0

A library of extension and helper modules for Python's data analysis and machine learning libraries.

3.35K
695
4d
n/a

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

1.2K
108
5m
BSD-3-Clause

Machine Learning toolbox for Humans

633
134
4y 109d
n/a

A scikit-learn based module for multi-label et. al. classification

632
122
1y 9m
BSD-2-Clause

Sequence learning toolkit for Python

586
98
5y 12d
MIT

Simple structured learning framework for python

630
169
2y 4m
BSD-2-Clause

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

452
67
3y 6m
n/a

Python implementation of the rulefit algorithm

219
67
102d
MIT

Metric learning algorithms in Python

1.11K
214
6m
MIT

[HELP REQUESTED] Generalized Additive Models in Python

561
105
7m
Apache-2.0

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

1.16K
140
32d
GPL-3.0

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

495
32
15d
GPL-3.0

Uplift modeling and causal inference with machine learning algorithms

1.68K
263
7d
n/a

Machine learning in Python. sklearn

Machine learning toolbox.

Time Series

A machine learning toolkit dedicated to time-series data

1.48K
223
32d
BSD-2-Clause

Module for statistical learning, with a particular emphasis on time-dependent modelling

323
74
8m
BSD-3-Clause

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

12.28K
3.51K
30d
MIT

Open source time series library for Python

1.85K
214
2y 73d
BSD-3-Clause

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

86
19
11m
MIT

Anomaly Detection and Correlation library

880
182
3y 49d
Apache-2.0

Datetimes for Humans™

3.25K
208
9m
MIT

Powerful extensions to the standard datetime module

Automated Machine Learning

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

7.83K
1.38K
51d
LGPL-3.0

Automated Machine Learning with scikit-learn

5.22K
982
9d
n/a

MLBox is a powerful Automated Machine Learning python library.

1.19K
247
6m
n/a

Ensemble Methods

Stacked Generalization (Ensemble Learning)

172
67
3y 68d
MIT

Library for machine learning stacking generalization.

106
22
1y 10m
Apache-2.0

Python package for stacking (machine learning technique)

582
68
1y 4m
n/a

High performance ensemble learning. sklearn

Imbalanced Datasets

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

5.05K
1.07K
8d
MIT

Python-based implementations of algorithms for learning on imbalanced data.

178
57
1y 99d
n/a

Random Forests

It is a forest of random projection trees

195
39
1y 20d
Apache-2.0

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

7
1
4y 7m
n/a

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

338
49
5d
n/a

Extreme Learning Machine

Extreme Learning Machine implementation in Python

448
233
3y 7m
n/a

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

61
53
2y 106d
n/a

High performance implementation of Extreme Learning Machines (fast randomized neural networks).

159
55
2y 9m
n/a

Kernel Methods

Factorization machines in python

822
299
2y 10m
n/a

fastFM: A Library for Factorization Machines

900
194
11m
n/a

TensorFlow implementation of an arbitrary order Factorization Machine

757
183
9m
MIT

Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-squares, quantile, and expectile regression.

48
5
1y 5m
AGPL-3.0

Relevance Vector Machine implementation using the scikit-learn API.

177
62
3y 9m
n/a

ThunderSVM: A Fast SVM Library on GPUs and CPUs

1.27K
169
16d
Apache-2.0

Gradient Boosting

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

20.61K
7.91K
4d
Apache-2.0

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

12.21K
3.21K
4d
MIT

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

5.71K
867
3d
Apache-2.0

ThunderGBM: Fast GBDTs and Random Forests on GPUs

580
71
52d
Apache-2.0

PyTorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

46.37K
12.33K
3d
n/a

Datasets, Transforms and Models specific to Computer Vision

8.42K
4.35K
3d
BSD-3-Clause

Data loaders and abstractions for text and NLP

2.65K
614
3d
BSD-3-Clause

Data manipulation and transformation for audio signal processing, powered by PyTorch

1.23K
280
4d
BSD-2-Clause

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

3.27K
438
3d
BSD-3-Clause

A simplified framework and utilities for PyTorch

452
49
15d
LGPL-3.0

A scikit-learn compatible neural network library that wraps PyTorch

3.79K
281
19d
BSD-3-Clause

Simple tools for logging and visualizing, loading and training

1.28K
190
52d
BSD-3-Clause

Geometric Deep Learning Extension Library for PyTorch

10.27K
1.76K
3d
MIT

Accelerated deep learning R&D

2.44K
280
36d
Apache-2.0

A Temporal Extension Library for PyTorch Geometric

349
40
9d
MIT

TensorFlow

An Open Source Machine Learning Framework for Everyone

153.46K
84.06K
3d
Apache-2.0

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

6.5K
1.46K
4d
n/a

Deep learning library featuring a higher-level API for TensorFlow.

9.52K
2.43K
88d
n/a

TensorFlow-based neural network library

8.77K
1.26K
16d
Apache-2.0

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

5.93K
1.77K
9d
Apache-2.0

Machine Learning Platform for Kubernetes

2.73K
265
4d
Apache-2.0

NeuPy is a Tensorflow based python library for prototyping and building neural networks

665
148
1y 5m
MIT

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

346
39
49d
BSD-3-Clause

TensorFlow ROCm port

543
65
4d
Apache-2.0

Deep learning with dynamic computation graphs in TensorFlow

1.8K
279
3y 119d
Apache-2.0

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

62
30
2y 10m
MIT

TensorLight - A high-level framework for TensorFlow

9
2
3y 9m
MIT

Mesh TensorFlow: Model Parallelism Made Easier

873
156
21d
Apache-2.0

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

7.57K
889
5d
Apache-2.0

Keras community contributions

1.49K
615
1y 70d
MIT

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

2.08K
301
66d
MIT

Distributed Deep learning with Keras & Spark

1.45K
288
13d
MIT

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

496
51
3y 9m
MIT

Graph Neural Networks with Keras and Tensorflow 2.

1.63K
198
10d
MIT

QKeras: a quantization deep learning library for Tensorflow Keras

246
50
6d
Apache-2.0

A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Keras compatible

MXNet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

19.28K
6.8K
4d
Apache-2.0

A clear, concise, simple yet powerful and efficient API for deep learning.

2.32K
226
3y 0d
Apache-2.0

Simple, efficient and flexible vision toolbox for mxnet framework.

31
9
3y 91d
BSD-3-Clause

Gluon CV Toolkit

4.56K
1.05K
4d
Apache-2.0

NLP made easy

2.23K
510
4d
Apache-2.0

Transfer Learning library for Deep Neural Networks.

174
42
8m
Apache-2.0

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

30
9
1y 63d
Apache-2.0

Others

Source-to-Source Debuggable Derivatives in Pure Python

2.15K
352
2y 6m
Apache-2.0

Efficiently computes derivatives of numpy code.

5.14K
733
1y 101d
MIT

Myia prototyping

367
39
10d
MIT

Neural Network Libraries

2.42K
299
9d
Apache-2.0

Caffe: a fast open framework for deep learning.

31.4K
18.77K
1y 14d
n/a

hipCaffe: the HIP port of Caffe

123
26
2y 5m
n/a

Probably the best curated list of data science software in Python.

786
142
56d
CC-BY-4.0

Web Scraping

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

7.8K
1.54K
10m
BSD-3-Clause

Scrape Twitter for Tweets

1.63K
497
7m
MIT

Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.

A fast high-level screen scraping and web crawling framework.

Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.

Data Containers

Create HTML profiling reports from pandas DataFrame objects

6.82K
1.04K
5d
MIT

cuDF - GPU DataFrame Library

3.68K
490
3d
Apache-2.0

NumPy and Pandas interface to Big Data

2.93K
378
1y 6m
n/a

sqldf for pandas

942
148
4y 26d
MIT

Pandas Google BigQuery

242
85
38d
BSD-3-Clause

Universal 1d/2d data containers with Transformers functionality for data analysis.

24
4
2y 4m
n/a

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

231
44
4d
n/a

High performance datastore for time series and tick data

2.18K
447
31d
LGPL-2.1

A Python package for manipulating 2-dimensional tabular data structures

1.14K
99
3d
MPL-2.0

Koalas: pandas API on Apache Spark

2.66K
298
3d
Apache-2.0

Modin: Speed up your Pandas workflows by changing a single line of code

5.73K
392
4d
n/a

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

1.55K
72
69d
MIT

The easy way to write your own flavor of Pandas

194
14
1y 90d
MIT

The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

169
11
74d
MIT

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

5.82K
448
21d
MIT

Powerful Python data analysis toolkit.

Pipelines

Easy pipelines for pandas DataFrames.

580
29
38d
n/a

functional data manipulation for pandas

175
24
5y 6m
n/a

dplyr for python

732
55
4y 59d
MIT

Pandas integration with sklearn

2.34K
389
6d
n/a

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

154
36
35d
Apache-2.0

Clean APIs for data cleaning. Python implementation of R package Janitor

628
121
5d
MIT

A Python toolkit for processing tabular data

374
24
6m
MIT

Build, test, deploy, iterate - Dev and prod tool for data science pipelines

46
2
1y 7m
n/a

Directions overlay for working with pandas in an analysis environment

417
21
6m
BSD-3-Clause

Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.

Automates your software builds, tests, and deployments.

General

An open source python library for automated feature engineering

5.41K
709
7d
BSD-3-Clause

scikit-learn addon to operate on set/"group"-based features

40
8
4y 6m
BSD-3-Clause

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

377
83
3y 63d
n/a

a feature engineering wrapper for sklearn

43
19
2y 99d
GPL-3.0

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

113
19
3y 90d
MIT

Automatic extraction of relevant features from time series:

5.42K
852
29d
MIT

Feature Selection

open-source feature selection repository in python

1.05K
350
1y 111d
GPL-2.0

Python implementations of the Boruta all-relevant feature selection method.

905
183
12d
BSD-3-Clause

A fast xgboost feature selection algorithm

163
28
3y 36d
MIT

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

309
58
11d
MIT

General Purposes

matplotlib: plotting with Python

13.17K
5.67K
6d
n/a

Statistical data visualization using matplotlib

8.14K
1.38K
9d
BSD-3-Clause

Painlessly create beautiful matplotlib plots.

1.56K
144
6y 4m
MIT

Ternary plotting library for python with matplotlib

392
111
9d
MIT

Missing data visualization module for Python.

2.65K
348
60d
MIT

Python library that makes it easy for data scientists to create charts.

2.82K
254
116d
Apache-2.0

Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

107
17
35d
MIT

Interactive plots

A python package for animating plots build on matplotlib.

357
32
4m
MIT

Interactive Data Visualization in the browser, from Python

14.7K
3.64K
3d
BSD-3-Clause

Plotting library for IPython/Jupyter notebooks

3.01K
434
3d
Apache-2.0

A Python library that makes interactive and publication-quality graphs.

Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph

Map

A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and folium

924
394
5d
MIT

Makes it easy to visualize data on an interactive open street map

Automatic Plotting

With Holoviews, your data visualizes itself.

1.81K
305
4d
BSD-3-Clause

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

305
60
67d
Apache-2.0

Visualize and compare datasets, target values and associations, with one line of code.

1.27K
145
7d
MIT

NLP

Python library for interactive topic model visualization. Port of the R LDAvis package.

1.41K
302
5d
BSD-3-Clause

Deployment

A collection of APIs to turn scripts and notebooks into interactive reports.

Enable sharing and execute Jupyter Notebooks

Modern, fast (high-performance), web framework for building APIs with Python

Make it easy to deploy machine learning model

Model Explanation

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

15
2
57d
MIT

Algorithms for monitoring and explaining machine learning models

897
110
8d
Apache-2.0

Code for "High-Precision Model-Agnostic Explanations" paper

622
91
5m
BSD-2-Clause

Bias and Fairness Audit Toolkit

359
67
23d
MIT

Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University

33
4
10d
BSD-3-Clause

Visual analysis and diagnostic tools to facilitate machine learning model selection.

3.11K
477
13d
Apache-2.0

An intuitive library to add plotting functionality to scikit-learn objects.

2.03K
249
2y 6m
MIT

A game theoretic approach to explain the output of any machine learning model.

11.77K
1.73K
16d
MIT

A library for debugging/inspecting machine learning classifiers and explaining their predictions

2.3K
298
1y 36d
MIT

Lime: Explaining the predictions of any machine learning classifier

8.46K
1.37K
45d
BSD-2-Clause
278
51
2y 4m
n/a
78
26
1y 10m
n/a

python partial dependence plot toolbox

533
85
2y 81d
MIT

Python implementation of R package breakDown

38
4
2y 118d
n/a

⬛ Python Individual Conditional Expectation Plot Toolbox

100
25
3y 33d
MIT

Python Library for Model Interpretation/Explanations

969
162
8m
UPL-1.0

Model analysis tools for TensorFlow

1.04K
212
7d
Apache-2.0

A library that implements fairness-aware machine learning algorithms

69
17
2y 7m
MIT
601
123
8m
BSD-3-Clause

Interpretability and explainability of data and machine learning models

770
174
84d
Apache-2.0

Auralisation of learned features in CNN (for audio)

37
10
3y 11m
n/a

🎆 A visualization of the CapsNet layers to better understand how it works

369
91
9m
MIT

A collection of infrastructure and tools for research in neural network interpretability.

4.07K
584
31d
Apache-2.0

Visualizer for neural network, deep learning, and machine learning models

13.37K
1.62K
3d
MIT

Exploration tool for your NeuralNetwork

31
2
2y 111d
MIT

tensorboard for pytorch (and chainer, mxnet, numpy, ...)

6.82K
788
7m
MIT

Logging MXNet data for visualization in TensorBoard.

330
50
1y 34d
Apache-2.0

Reinforcement Learning

A toolkit for developing and comparing reinforcement learning algorithms.

23.5K
6.69K
10d
n/a

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

1.92K
379
17d
Apache-2.0

A toolkit for reproducible reinforcement learning research.

1.08K
203
4d
MIT

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

11.18K
3.89K
1y 27d
MIT

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

2.92K
583
65d
MIT

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

2.81K
398
3d
BSD-3-Clause

TF-Agents is a library for Reinforcement Learning in TensorFlow

1.8K
471
3d
Apache-2.0

Tensorforce: a TensorFlow library for applied reinforcement learning

2.88K
489
18d
Apache-2.0

TensorFlow Reinforcement Learning

3.06K
371
10m
Apache-2.0

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

9.32K
1.24K
14d
Apache-2.0

Deep Reinforcement Learning for Keras.

4.96K
1.3K
1y 108d
MIT

ChainerRL is a deep reinforcement learning library built on top of Chainer.

928
212
79d
MIT

Probabilistic Methods

Fast, flexible and easy to use probabilistic modelling in Python.

2.6K
473
14d
MIT

Deep universal probabilistic programming with Python and PyTorch

6.74K
826
4d
Apache-2.0

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

888
232
7m
AFL-3.0

Decorator for PyMC3

45
5
3y 49d
n/a

InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

130
15
7m
Apache-2.0

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

38
11
7d
ISC

Bayesian dessert for Lasagne

84
6
3y 8m
MIT

Python package for Bayesian Machine Learning with scikit-learn API

420
108
1y 56d
MIT

Scikit-learn compatible estimation of general graphical models

176
31
64d
MIT

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

1.71K
563
4d
MIT

Supervised domain-agnostic prediction framework for probabilistic modelling

107
15
2y 9d
n/a

A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

126
9
2y 5m
Apache-2.0

Probabilistic Programming and Statistical Inference in PyTorch

108
14
3y 7m
MIT

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

315
46
2y 76d
MIT

The Python ensemble sampling toolkit for affine-invariant MCMC

1.11K
389
8d
MIT

A library for hidden semi-Markov models with explicit durations

46
11
3y 45d
GPL-3.0
465
146
6m
MIT

A highly efficient and modular implementation of Gaussian Processes in PyTorch

2.3K
328
4d
MIT

Modular Probabilistic Programming on MXNet

95
26
1y 9m
Apache-2.0

scikit-learn inspired API for CRFsuite

360
154
1y 84d
n/a

Python package for Bayesian statistical modeling and Probabilistic Machine Learning. Theano compatible

A library for probabilistic modeling, inference, and criticism. sklearn

Genetic Programming

Genetic Programming in Python, with a scikit-learn inspired API

907
156
1y 12d
BSD-3-Clause

Distributed Evolutionary Algorithms in Python

4.09K
868
63d
LGPL-3.0

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

128
60
31d
n/a

A strongly-typed genetic programming framework for Python

98
11
2y 8m
n/a

Genetic feature selection module for scikit-learn

141
42
93d
n/a

Optimization

Spearmint Bayesian optimization codebase

1.41K
302
1y 11m
n/a

Bayesian optimization in PyTorch

1.84K
185
4d
MIT

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

1.98K
463
24d
MIT

Sequential Model-based Algorithm Configuration

561
156
4m
n/a

optimization routines for hyperparameter tuning

363
74
9m
n/a

Distributed Asynchronous Hyperparameter Optimization in Python

5.45K
868
17d
n/a

Hyper-parameter optimization for sklearn

1.18K
224
4m
n/a

Use evolutionary algorithms instead of gridsearch in scikit-learn

628
112
1y 86d
MIT

SigOpt wrappers for scikit-learn methods

69
12
11m
MIT

A Python implementation of global optimization with gaussian processes.

4.89K
1.1K
5m
MIT

Safe Bayesian Optimization

90
35
10m
MIT

Sequential model-based optimization with a scipy.optimize interface

2.04K
390
57d
BSD-3-Clause

🎯 A comprehensive gradient-free optimization framework written in Python

545
55
3y 5m
MIT

A research toolkit for particle swarm optimization in Python

730
234
54d
MIT

A Free and Open Source Python Library for Multiobjective Optimization

309
116
8m
GPL-3.0

Bayesian Optimization using GPflow

227
55
86d
Apache-2.0

POT : Python Optimal Transport

912
199
66d
MIT

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

1.38K
229
97d
MIT

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

938
295
84d
n/a

Natural Language Processing

NLTK Source

9.65K
2.44K
16d
Apache-2.0

The Classical Language Toolkit

630
300
4d
MIT

scikit-learn wrappers for Python fastText.

210
21
5m
n/a

Simple text to phones converter for multiple languages

410
84
42d
GPL-3.0

A very simple framework for state-of-the-art Natural Language Processing (NLP)

9.97K
1.51K
5d
n/a

Topic Modelling for Humans.

A natural language processing toolkit.

A library for industrial-strength natural language processing in Python and Cython.

Computer Audition

Python library for audio and music analysis

4.31K
701
10d
ISC

Audio features extraction

186
42
1y 6m
LGPL-3.0

a library for audio and music analysis

2.05K
301
38d
GPL-3.0

C++ library for audio and music analysis, description and synthesis, including Python bindings

1.74K
403
3d
AGPL-3.0

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

201
46
1y 7m
MIT

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

330
92
4m
GPL-2.0

A library for augmenting annotated audio data

174
31
7m
ISC

Python audio and music signal processing library

718
122
1y 70d
n/a

Computer Vision

Open Source Computer Vision Library

52.52K
43.35K
4d
n/a

Image processing in Python

4.2K
1.76K
4d
n/a

Image augmentation for machine learning experiments.

10.72K
2K
9m
MIT

Image augmentation library in Python for machine learning.

4.32K
806
11m
MIT

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

7.35K
960
10d
MIT

Statistics

An extension to pandas dataframes describe function.

358
36
1y 6m
MIT

Create HTML profiling reports from pandas DataFrame objects

6.82K
1.04K
5d
MIT

Statsmodels: statistical modeling and econometrics in Python

6.04K
2.17K
7d
n/a

Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.

724
201
4m
BSD-3-Clause

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

83
9
1y 73d
MIT

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

178
19
8d
MIT

Performance analysis of predictive (alpha) stock factors

1.78K
671
10m
Apache-2.0

Distributed Computing

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

10.85K
1.75K
3d
n/a

Distributed machine learning platform

890
185
4y 34d
n/a

Framework and Library for Distributed Online Machine Learning

701
149
1y 9m
LGPL-2.1

Microsoft Distributed Machine Learning Toolkit

2.77K
593
3y 7m
MIT

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

14.34K
3.57K
3d
Apache-2.0

Scalable Machine Learning with Dask

685
189
11d
BSD-3-Clause

A distributed task scheduler for Dask

1.15K
512
3d
n/a

Exposes the Spark programming model to Python. Apache Spark based

Experimentation

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

3.32K
318
8d
MIT

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

1.25K
109
3y 6m
Apache-2.0

A visual dataflow programming language for sklearn

175
36
3y 8m
MIT

Adaptive Experimentation Platform

1.41K
143
6d
MIT

A lightweight ML experiment tracking, results visualization and management tool.

Evaluation

A library of metrics for evaluating recommender systems

236
56
85d
MIT

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

1.39K
426
5y 5m
n/a

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

286
26
60d
MIT

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

1.23K
409
4d
Apache-2.0

Computations

Parallel computing with task scheduling

7.95K
1.23K
3d
BSD-3-Clause

Fast NumPy array functions written in C

577
71
33d
BSD-2-Clause

A NumPy-compatible array library accelerated by CUDA

4.84K
435
3d
MIT

Python library for multilinear algebra and tensor factorizations

379
109
4y 4m
GPL-3.0

Solve automatic numerical differentiation problems in one or more variables.

136
31
72d
BSD-3-Clause

Add built-in support for quaternions to numpy

378
59
116d
MIT

Adaptive: parallel active learning of mathematical functions

642
36
4m
BSD-3-Clause

A fundamental package for scientific computing with Python.

Spatial Analysis

Python tools for geographic data

2.48K
556
3d
BSD-3-Clause

PySAL: Python Spatial Analysis Library Meta-Package

822
258
25d
BSD-3-Clause

Quantum Computing

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

773
222
6d
Apache-2.0

QML: Quantum Machine Learning

139
54
2y 5m
MIT

Conversion

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

996
129
1y 71d
MIT

Open standard for machine learning interoperability

9.8K
1.82K
4d
Apache-2.0

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

5.22K
949
6m
MIT