User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Data Science

Probably the best curated list of data science software in Python.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Dec. 3, 2020, 9:02 a.m.

Thank you krzjoa & contributors
View Topic on GitHub:
krzjoa/awesome-python-data-science

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

General Purpouse Machine Learning

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

2.8K
497
9m
Apache-2.0

cuML - RAPIDS Machine Learning Library

1.81K
294
0d
Apache-2.0

A modular active learning framework for Python

959
159
32d
MIT

PySpark + Scikit-learn = Sparkit-learn

1.05K
237
3y 41d
Apache-2.0

mlpack: a scalable C++ machine learning library --

3.45K
1.27K
2d
n/a

A toolkit for making real world machine learning and data analysis applications in C++

9.66K
2.83K
4d
BSL-1.0

A library of extension and helper modules for Python's data analysis and machine learning libraries.

3.23K
687
7d
n/a

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

1.19K
109
76d
BSD-3-Clause

Machine Learning toolbox for Humans

633
135
4y 24d
n/a

A scikit-learn based module for multi-label et. al. classification

604
120
1y 6m
BSD-2-Clause

Sequence learning toolkit for Python

573
97
4y 9m
MIT

Simple structured learning framework for python

628
167
2y 63d
BSD-2-Clause

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

448
69
3y 115d
n/a

Python implementation of the rulefit algorithm

212
68
17d
MIT

Metric learning algorithms in Python

1.08K
212
4m
MIT

[HELP REQUESTED] Generalized Additive Models in Python

543
101
4m
Apache-2.0

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

1.07K
128
2d
GPL-3.0

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

472
30
0d
GPL-3.0

Uplift modeling and causal inference with machine learning algorithms

1.46K
222
14d
n/a

Machine learning in Python. sklearn

Machine learning toolbox.

Time Series

A machine learning toolkit dedicated to time-series data

1.39K
210
42d
BSD-2-Clause

Module for statistical learning, with a particular emphasis on time-dependent modelling

316
73
5m
BSD-3-Clause

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

11.88K
3.37K
23d
MIT

Open source time series library for Python

1.83K
210
1y 11m
BSD-3-Clause

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

83
20
8m
MIT

Anomaly Detection and Correlation library

857
177
2y 10m
Apache-2.0

Datetimes for Humans™

3.21K
203
6m
MIT

Powerful extensions to the standard datetime module

Automated Machine Learning

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

7.66K
1.34K
3d
LGPL-3.0

Automated Machine Learning with scikit-learn

5.04K
959
22d
n/a

MLBox is a powerful Automated Machine Learning python library.

1.15K
240
100d
n/a

Ensemble Methods

Stacked Generalization (Ensemble Learning)

165
65
2y 11m
MIT

Library for machine learning stacking generalization.

106
21
1y 7m
Apache-2.0

Python package for stacking (machine learning technique)

570
68
1y 35d
n/a

High performance ensemble learning. sklearn

Imbalanced Datasets

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

4.87K
1.04K
30d
MIT

Python-based implementations of algorithms for learning on imbalanced data.

170
54
1y 14d
n/a

Random Forests

It is a forest of random projection trees

193
39
10m
Apache-2.0

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

7
2
4y 4m
n/a

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

328
47
67d
n/a

Extreme Learning Machine

Extreme Learning Machine implementation in Python

445
229
3y 4m
n/a

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

60
54
2y 21d
n/a

High performance implementation of Extreme Learning Machines (fast randomized neural networks).

154
55
2y 6m
n/a

Kernel Methods

Factorization machines in python

806
298
2y 7m
n/a

fastFM: A Library for Factorization Machines

880
193
9m
n/a

TensorFlow implementation of an arbitrary order Factorization Machine

751
181
6m
MIT

Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-squares, quantile, and expectile regression.

47
5
1y 92d
AGPL-3.0

Relevance Vector Machine implementation using the scikit-learn API.

174
61
3y 6m
n/a

ThunderSVM: A Fast SVM Library on GPUs and CPUs

1.24K
166
5m
Apache-2.0

Gradient Boosting

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

20.22K
7.83K
0d
Apache-2.0

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

11.89K
3.13K
1d
MIT

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

5.55K
846
0d
Apache-2.0

ThunderGBM: Fast GBDTs and Random Forests on GPUs

567
69
4m
Apache-2.0

PyTorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

44.41K
11.73K
0d
n/a

Datasets, Transforms and Models specific to Computer Vision

7.85K
4.05K
0d
BSD-3-Clause

Data loaders and abstractions for text and NLP

2.57K
583
13d
BSD-3-Clause

Data manipulation and transformation for audio signal processing, powered by PyTorch

1.15K
260
1d
BSD-2-Clause

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

3.17K
412
0d
BSD-3-Clause

A simplified framework and utilities for PyTorch

409
46
19d
LGPL-3.0

A scikit-learn compatible neural network library that wraps PyTorch

3.63K
270
32d
BSD-3-Clause

Simple tools for logging and visualizing, loading and training

1.25K
180
10m
BSD-3-Clause

Geometric Deep Learning Extension Library for PyTorch

9.57K
1.62K
0d
MIT

Accelerated deep learning R&D

2.31K
262
2d
Apache-2.0

A Temporal Extension Library for PyTorch Geometric

271
27
3d
MIT

TensorFlow

An Open Source Machine Learning Framework for Everyone

151.07K
83.4K
0d
Apache-2.0

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

6.39K
1.44K
35d
n/a

Deep learning library featuring a higher-level API for TensorFlow.

9.47K
2.43K
3d
n/a

TensorFlow-based neural network library

8.63K
1.25K
56d
Apache-2.0

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

5.82K
1.74K
32d
Apache-2.0

Machine Learning Platform for Kubernetes

2.64K
256
1d
Apache-2.0

NeuPy is a Tensorflow based python library for prototyping and building neural networks

652
146
1y 93d
MIT

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

348
39
3y 8m
MIT

TensorFlow ROCm port

520
62
1d
Apache-2.0

Deep learning with dynamic computation graphs in TensorFlow

1.8K
279
3y 34d
Apache-2.0

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

62
30
2y 7m
MIT

TensorLight - A high-level framework for TensorFlow

9
2
3y 7m
MIT

Mesh TensorFlow: Model Parallelism Made Easier

591
109
1d
Apache-2.0

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

7.31K
881
0d
Apache-2.0

Keras community contributions

1.46K
603
11m
MIT

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

2.05K
297
62d
MIT

Distributed Deep learning with Keras & Spark

1.43K
284
62d
MIT

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

497
50
3y 6m
MIT

Graph Neural Networks with Keras and Tensorflow 2.

1.51K
170
1d
MIT

QKeras: a quantization deep learning library for Tensorflow Keras

228
39
8d
Apache-2.0

A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Keras compatible

MXNet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

19.14K
6.79K
0d
Apache-2.0

A clear, concise, simple yet powerful and efficient API for deep learning.

2.33K
227
2y 9m
Apache-2.0

Simple, efficient and flexible vision toolbox for mxnet framework.

31
10
3y 6d
BSD-3-Clause

Gluon CV Toolkit

4.36K
1.01K
1d
Apache-2.0

NLP made easy

2.18K
502
56d
Apache-2.0

Transfer Learning library for Deep Neural Networks.

169
38
5m
Apache-2.0

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

29
9
11m
Apache-2.0

Others

Source-to-Source Debuggable Derivatives in Pure Python

2.14K
345
2y 118d
Apache-2.0

Efficiently computes derivatives of numpy code.

5.02K
726
1y 16d
MIT

Myia prototyping

361
38
6d
MIT

Neural Network Libraries

2.38K
302
1d
Apache-2.0

Caffe: a fast open framework for deep learning.

31.13K
18.64K
9m
n/a

hipCaffe: the HIP port of Caffe

123
25
2y 71d
n/a

Probably the best curated list of data science software in Python.

710
133
51d
CC-BY-4.0

Web Scraping

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

7.68K
1.53K
7m
BSD-3-Clause

Scrape Twitter for Tweets

1.54K
474
4m
MIT

Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.

A fast high-level screen scraping and web crawling framework.

Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.

Data Containers

Create HTML profiling reports from pandas DataFrame objects

6.37K
966
3d
MIT

cuDF - GPU DataFrame Library

3.49K
466
0d
Apache-2.0

NumPy and Pandas interface to Big Data

2.91K
376
1y 111d
n/a

sqldf for pandas

916
143
3y 10m
MIT

Pandas Google BigQuery

226
82
23d
BSD-3-Clause

Universal 1d/2d data containers with Transformers functionality for data analysis.

22
3
2y 50d
n/a

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

232
42
31d
n/a

High performance datastore for time series and tick data

2.09K
428
1d
LGPL-2.1

A Python package for manipulating 2-dimensional tabular data structures

1.05K
89
1d
MPL-2.0

Koalas: pandas API on Apache Spark

2.51K
290
1d
Apache-2.0

Modin: Speed up your Pandas workflows by changing a single line of code

5.5K
381
0d
n/a

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

1.47K
69
53d
MIT

The easy way to write your own flavor of Pandas

190
14
1y 5d
MIT

The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

161
9
7m
MIT

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

5.42K
417
1d
MIT

Powerful Python data analysis toolkit.

Pipelines

Easy pipelines for pandas DataFrames.

572
28
16d
n/a

functional data manipulation for pandas

174
25
5y 100d
n/a

dplyr for python

726
56
3y 11m
MIT

Pandas integration with sklearn

2.27K
384
0d
n/a

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

153
33
1d
Apache-2.0

Clean APIs for data cleaning. Python implementation of R package Janitor

596
122
10d
MIT

A Python toolkit for processing tabular data

374
24
111d
MIT

Build, test, deploy, iterate - Dev and prod tool for data science pipelines

45
2
1y 4m
n/a

Directions overlay for working with pandas in an analysis environment

410
20
100d
BSD-3-Clause

Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.

Automates your software builds, tests, and deployments.

General

An open source python library for automated feature engineering

5.24K
679
1d
BSD-3-Clause

scikit-learn addon to operate on set/"group"-based features

40
7
4y 118d
BSD-3-Clause

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

377
83
2y 11m
n/a

a feature engineering wrapper for sklearn

42
19
2y 14d
GPL-3.0

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

112
19
3y 5d
MIT

Automatic extraction of relevant features from time series:

5.2K
822
2d
MIT

Feature Selection

open-source feature selection repository in python

1.01K
338
1y 26d
GPL-2.0

Python implementations of the Boruta all-relevant feature selection method.

827
173
43d
BSD-3-Clause

A fast xgboost feature selection algorithm

149
28
2y 10m
MIT

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

302
54
4m
MIT

General Purposes

matplotlib: plotting with Python

12.71K
5.53K
0d
n/a

Statistical data visualization using matplotlib

7.87K
1.34K
2d
BSD-3-Clause

Painlessly create beautiful matplotlib plots.

1.55K
143
6y 60d
MIT

Ternary plotting library for python with matplotlib

378
108
48d
MIT

Missing data visualization module for Python.

2.55K
334
10d
MIT

Python library that makes it easy for data scientists to create charts.

2.78K
243
31d
Apache-2.0

P(i/y)thon h(i/y)stograms.

107
17
18d
MIT

Interactive plots

A python package for animating plots build on matplotlib.

349
33
59d
MIT

Interactive Data Visualization in the browser, from Python

14.32K
3.58K
1d
BSD-3-Clause

Plotting library for IPython/Jupyter notebooks

2.95K
429
28d
Apache-2.0

A Python library that makes interactive and publication-quality graphs.

Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph

Map

A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets

764
338
1d
MIT

Makes it easy to visualize data on an interactive open street map

Automatic Plotting

With Holoviews, your data visualizes itself.

1.7K
288
0d
BSD-3-Clause

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

257
53
13d
Apache-2.0

Visualize and compare datasets, target values and associations, with one line of code.

1.14K
129
3d
MIT

NLP

Python library for interactive topic model visualization. Port of the R LDAvis package.

1.35K
293
1d
BSD-3-Clause

Deployment

A collection of APIs to turn scripts and notebooks into interactive reports.

Enable sharing and execute Jupyter Notebooks

Modern, fast (high-performance), web framework for building APIs with Python

Make it easy to deploy machine learning model

Model Explanation

Algorithms for monitoring and explaining machine learning models

776
99
1d
Apache-2.0

Code for "High-Precision Model-Agnostic Explanations" paper

602
87
84d
BSD-2-Clause

Bias and Fairness Audit Toolkit

323
60
6d
n/a

Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University

32
4
70d
BSD-3-Clause

Visual analysis and diagnostic tools to facilitate machine learning model selection.

3.02K
469
32d
Apache-2.0

An intuitive library to add plotting functionality to scikit-learn objects.

1.97K
241
2y 107d
MIT

A game theoretic approach to explain the output of any machine learning model.

11.02K
1.6K
9d
MIT

A library for debugging/inspecting machine learning classifiers and explaining their predictions

2.21K
291
10m
MIT

Lime: Explaining the predictions of any machine learning classifier

8.17K
1.33K
84d
BSD-2-Clause
278
51
2y 61d
n/a
78
26
1y 7m
n/a

python partial dependence plot toolbox

512
82
1y 12m
MIT

Python implementation of R package breakDown

38
4
2y 33d
n/a

⬛ Python Individual Conditional Expectation Plot Toolbox

99
24
2y 10m
MIT

Python Library for Model Interpretation/Explanations

949
161
5m
UPL-1.0

Model analysis tools for TensorFlow

1.01K
202
0d
Apache-2.0

A library that implements fairness-aware machine learning algorithms

67
17
2y 4m
MIT
601
123
5m
BSD-3-Clause

Interpretability and explainability of data and machine learning models

721
158
2d
Apache-2.0

Auralisation of learned features in CNN (for audio)

37
10
3y 8m
n/a

🎆 A visualization of the CapsNet layers to better understand how it works

364
91
7m
MIT

A collection of infrastructure and tools for research in neural network interpretability.

3.98K
566
9d
Apache-2.0

Visualizer for neural network, deep learning, and machine learning models

12.25K
1.5K
0d
MIT

Exploration tool for your NeuralNetwork

31
2
2y 26d
MIT

tensorboard for pytorch (and chainer, mxnet, numpy, ...)

6.67K
775
5m
MIT

Logging MXNet data for visualization in TensorBoard.

329
49
10m
Apache-2.0

Reinforcement Learning

A toolkit for developing and comparing reinforcement learning algorithms.

22.83K
6.49K
24d
n/a

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

1.87K
378
24d
Apache-2.0

A toolkit for reproducible reinforcement learning research.

973
182
1d
MIT

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

10.84K
3.76K
10m
MIT

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

2.67K
534
52d
MIT

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

2.69K
377
0d
BSD-3-Clause

TF-Agents is a library for Reinforcement Learning in TensorFlow

1.7K
440
0d
Apache-2.0

Tensorforce: a TensorFlow library for applied reinforcement learning

2.81K
483
4d
Apache-2.0

TensorFlow Reinforcement Learning

3.05K
372
7m
Apache-2.0

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

9.23K
1.23K
9d
Apache-2.0

Deep Reinforcement Learning for Keras.

4.87K
1.29K
1y 23d
MIT

ChainerRL is a deep reinforcement learning library built on top of Chainer.

911
212
8d
MIT

Probabilistic Methods

Fast, flexible and easy to use probabilistic modelling in Python.

2.5K
462
4d
MIT

Deep universal probabilistic programming with Python and PyTorch

6.62K
795
4d
Apache-2.0

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

888
228
4m
AFL-3.0

Decorator for PyMC3

46
5
2y 10m
n/a

InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

129
15
5m
Apache-2.0

PyStan, the Python interface to Stan

885
192
12d
GPL-3.0

Bayesian dessert for Lasagne

84
6
3y 5m
MIT

Python package for Bayesian Machine Learning with scikit-learn API

418
107
11m
MIT

Scikit-learn compatible estimation of general graphical models

173
29
1y 9m
MIT

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

1.64K
551
0d
MIT

Supervised domain-agnostic prediction framework for probabilistic modelling

103
15
1y 9m
n/a

A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

126
9
2y 75d
Apache-2.0

Probabilistic Programming and Statistical Inference in PyTorch

108
14
3y 5m
MIT

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

311
43
1y 11m
MIT

The Python ensemble sampling toolkit for affine-invariant MCMC

1.08K
384
75d
MIT

A library for hidden semi-Markov models with explicit durations

45
9
2y 10m
GPL-3.0
465
146
101d
MIT

A highly efficient and modular implementation of Gaussian Processes in PyTorch

2.19K
308
3d
MIT

Modular Probabilistic Programming on MXNet

95
25
1y 6m
Apache-2.0

scikit-learn inspired API for CRFsuite

347
151
12m
n/a

Python package for Bayesian statistical modeling and Probabilistic Machine Learning. Theano compatible

A library for probabilistic modeling, inference, and criticism. sklearn

Genetic Programming

Genetic Programming in Python, with a scikit-learn inspired API

890
148
9m
BSD-3-Clause

Distributed Evolutionary Algorithms in Python

3.98K
836
60d
LGPL-3.0

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

123
56
1y 5m
n/a

A strongly-typed genetic programming framework for Python

95
11
2y 5m
n/a

Genetic feature selection module for scikit-learn

132
43
8d
n/a

Optimization

Spearmint Bayesian optimization codebase

1.39K
298
1y 8m
n/a

Bayesian optimization in PyTorch

1.77K
170
1d
MIT

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

1.56K
380
13d
MIT

Sequential Model-based Algorithm Configuration

540
153
35d
n/a

optimization routines for hyperparameter tuning

359
72
6m
n/a

Distributed Asynchronous Hyperparameter Optimization in Python

5.23K
841
15d
n/a

Hyper-parameter optimization for sklearn

1.13K
224
39d
n/a

Use evolutionary algorithms instead of gridsearch in scikit-learn

613
108
1y 1d
MIT

SigOpt wrappers for scikit-learn methods

69
12
8m
MIT

A Python implementation of global optimization with gaussian processes.

4.67K
1.04K
4m
MIT

Safe Bayesian Optimization

83
35
7m
MIT

Sequential model-based optimization with a scipy.optimize interface

1.97K
374
65d
BSD-3-Clause

🎯 A comprehensive gradient-free optimization framework written in Python

541
56
3y 94d
MIT

A research toolkit for particle swarm optimization in Python

689
229
16d
MIT

A Free and Open Source Python Library for Multiobjective Optimization

287
108
6m
GPL-3.0

Bayesian Optimization using GPflow

218
53
2y 83d
Apache-2.0

POT : Python Optimal Transport

832
188
21d
MIT

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

1.34K
224
12d
MIT

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

879
282
12d
n/a

Natural Language Processing

NLTK Source

9.45K
2.41K
10d
Apache-2.0

The Classical Language Toolkit

620
295
7d
MIT
16
3
5y 112d
MIT

scikit-learn wrappers for Python fastText.

209
22
87d
n/a

Simple text to phones converter for multiple languages

372
77
71d
GPL-3.0

A very simple framework for state-of-the-art Natural Language Processing (NLP)

9.62K
1.39K
8d
n/a

Topic Modelling for Humans.

A natural language processing toolkit.

A library for industrial-strength natural language processing in Python and Cython.

Computer Audition

Python library for audio and music analysis

4.08K
670
83d
ISC

Audio features extraction

184
42
1y 117d
LGPL-3.0

a library for audio and music analysis

1.97K
293
5m
GPL-3.0

C++ library for audio and music analysis, description and synthesis, including Python bindings

1.66K
392
6d
AGPL-3.0

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

197
45
1y 4m
MIT

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

324
90
58d
GPL-2.0

A library for augmenting annotated audio data

172
29
4m
ISC

Python audio and music signal processing library

679
114
11m
n/a

Computer Vision

Open Source Computer Vision Library

50.48K
41.12K
3d
n/a

Image processing in Python

4.07K
1.69K
6d
n/a

Image augmentation for machine learning experiments.

10.31K
1.96K
6m
MIT

Image augmentation library in Python for machine learning.

4.21K
790
8m
MIT

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

6.61K
858
4d
MIT

Statistics

An extension to pandas dataframes describe function.

356
34
1y 102d
MIT

Create HTML profiling reports from pandas DataFrame objects

6.37K
966
3d
MIT

Statsmodels: statistical modeling and econometrics in Python

5.77K
2.12K
5d
n/a

Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.

649
183
47d
BSD-3-Clause

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

79
9
11m
MIT

Pairwise Multiple Comparisons (Post Hoc) Tests in Python

166
17
45d
MIT

Performance analysis of predictive (alpha) stock factors

1.62K
582
7m
Apache-2.0

Distributed Computing

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

10.49K
1.69K
0d
n/a

Distributed machine learning platform

893
186
3y 10m
n/a

Framework and Library for Distributed Online Machine Learning

701
149
1y 6m
LGPL-2.1

Microsoft Distributed Machine Learning Toolkit

2.77K
595
3y 4m
MIT

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

13.46K
3.33K
0d
Apache-2.0

Scalable Machine Learning with Dask

674
180
8d
BSD-3-Clause

A distributed task scheduler for Dask

1.11K
486
0d
n/a

Exposes the Spark programming model to Python. Apache Spark based

Experimentation

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

3.2K
314
24d
MIT

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

1.26K
106
3y 105d
Apache-2.0

A visual dataflow programming language for sklearn

175
34
3y 5m
MIT

Adaptive Experimentation Platform

1.34K
137
8d
MIT

A lightweight ML experiment tracking, results visualization and management tool.

Evaluation

A library of metrics for evaluating recommender systems

214
52
8m
MIT

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

1.38K
426
5y 87d
n/a

scikit-learn model evaluation made easy: plots, tables and markdown reports.

285
25
62d
MIT

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

1.14K
379
36d
Apache-2.0

Computations

Parallel computing with task scheduling

7.58K
1.19K
0d
BSD-3-Clause

Fast NumPy array functions written in C

563
63
9d
BSD-2-Clause

A NumPy-compatible array library accelerated by CUDA

4.69K
426
3d
MIT

Python library for multilinear algebra and tensor factorizations

378
108
4y 61d
GPL-3.0

Solve automatic numerical differentiation problems in one or more variables.

124
31
30d
BSD-3-Clause

Add built-in support for quaternions to numpy

355
57
31d
MIT

Adaptive: parallel active learning of mathematical functions

639
34
37d
BSD-3-Clause

A fundamental package for scientific computing with Python.

Spatial Analysis

Python tools for geographic data

2.37K
523
6d
BSD-3-Clause

PySAL: Python Spatial Analysis Library Meta-Package

797
247
4m
BSD-3-Clause

Quantum Computing

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

643
190
6d
Apache-2.0

QML: Quantum Machine Learning

134
52
2y 85d
MIT

Conversion

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

931
119
11m
MIT

Open standard for machine learning interoperability

9.39K
1.71K
8d
MIT

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

5.09K
939
111d
MIT