Your first time on this page? Allow me to give some explanations.
Awesome Data Science
Probably the best curated list of data science software in Python.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you krzjoa & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
General Purpouse Machine Learning
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
cuML - RAPIDS Machine Learning Library
A modular active learning framework for Python
PySpark + Scikit-learn = Sparkit-learn
mlpack: a scalable C++ machine learning library --
A toolkit for making real world machine learning and data analysis applications in C++
A library of extension and helper modules for Python's data analysis and machine learning libraries.
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Machine Learning toolbox for Humans
A scikit-learn based module for multi-label et. al. classification
Sequence learning toolkit for Python
Simple structured learning framework for python
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
Python implementation of the rulefit algorithm
Metric learning algorithms in Python
[HELP REQUESTED] Generalized Additive Models in Python
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Uplift modeling and causal inference with machine learning algorithms
A machine learning toolkit dedicated to time-series data
Module for statistical learning, with a particular emphasis on time-dependent modelling
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Open source time series library for Python
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
Anomaly Detection and Correlation library
Datetimes for Humans™
Automated Machine Learning
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Automated Machine Learning with scikit-learn
MLBox is a powerful Automated Machine Learning python library.
Stacked Generalization (Ensemble Learning)
Library for machine learning stacking generalization.
Python package for stacking (machine learning technique)
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Python-based implementations of algorithms for learning on imbalanced data.
It is a forest of random projection trees
Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
Extreme Learning Machine
Extreme Learning Machine implementation in Python
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.
High performance implementation of Extreme Learning Machines (fast randomized neural networks).
Factorization machines in python
fastFM: A Library for Factorization Machines
TensorFlow implementation of an arbitrary order Factorization Machine
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-squares, quantile, and expectile regression.
Relevance Vector Machine implementation using the scikit-learn API.
ThunderSVM: A Fast SVM Library on GPUs and CPUs
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
ThunderGBM: Fast GBDTs and Random Forests on GPUs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Datasets, Transforms and Models specific to Computer Vision
Data loaders and abstractions for text and NLP
Data manipulation and transformation for audio signal processing, powered by PyTorch
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
A simplified framework and utilities for PyTorch
A scikit-learn compatible neural network library that wraps PyTorch
Simple tools for logging and visualizing, loading and training
Geometric Deep Learning Extension Library for PyTorch
Accelerated deep learning R&D
A Temporal Extension Library for PyTorch Geometric
An Open Source Machine Learning Framework for Everyone
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥
Deep learning library featuring a higher-level API for TensorFlow.
TensorFlow-based neural network library
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
Machine Learning Platform for Kubernetes
NeuPy is a Tensorflow based python library for prototyping and building neural networks
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.
TensorFlow ROCm port
Deep learning with dynamic computation graphs in TensorFlow
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
TensorLight - A high-level framework for TensorFlow
Mesh TensorFlow: Model Parallelism Made Easier
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.
Keras community contributions
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
Distributed Deep learning with Keras & Spark
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
Graph Neural Networks with Keras and Tensorflow 2.
QKeras: a quantization deep learning library for Tensorflow Keras
A clear, concise, simple yet powerful and efficient API for deep learning.
Simple, efficient and flexible vision toolbox for mxnet framework.
Gluon CV Toolkit
NLP made easy
Transfer Learning library for Deep Neural Networks.
Source-to-Source Debuggable Derivatives in Pure Python
Efficiently computes derivatives of numpy code.
Neural Network Libraries
Caffe: a fast open framework for deep learning.
hipCaffe: the HIP port of Caffe
Probably the best curated list of data science software in Python.
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Scrape Twitter for Tweets
Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
Create HTML profiling reports from pandas DataFrame objects
cuDF - GPU DataFrame Library
NumPy and Pandas interface to Big Data
sqldf for pandas
Pandas Google BigQuery
Universal 1d/2d data containers with Transformers functionality for data analysis.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
High performance datastore for time series and tick data
A Python package for manipulating 2-dimensional tabular data structures
Koalas: pandas API on Apache Spark
Modin: Speed up your Pandas workflows by changing a single line of code
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
The easy way to write your own flavor of Pandas
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Easy pipelines for pandas DataFrames.
functional data manipulation for pandas
dplyr for python
Pandas integration with sklearn
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Clean APIs for data cleaning. Python implementation of R package Janitor
A Python toolkit for processing tabular data
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Directions overlay for working with pandas in an analysis environment
Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.
An open source python library for automated feature engineering
scikit-learn addon to operate on set/"group"-based features
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API
a feature engineering wrapper for sklearn
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
Automatic extraction of relevant features from time series:
open-source feature selection repository in python
Python implementations of the Boruta all-relevant feature selection method.
A fast xgboost feature selection algorithm
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
matplotlib: plotting with Python
Statistical data visualization using matplotlib
Painlessly create beautiful matplotlib plots.
Ternary plotting library for python with matplotlib
Missing data visualization module for Python.
Python library that makes it easy for data scientists to create charts.
Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.
A python package for animating plots build on matplotlib.
Interactive Data Visualization in the browser, from Python
Plotting library for IPython/Jupyter notebooks
🎨 Python Echarts Plotting Library
A Python library that makes interactive and publication-quality graphs.
A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and folium
With Holoviews, your data visualizes itself.
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Visualize and compare datasets, target values and associations, with one line of code.
Python library for interactive topic model visualization. Port of the R LDAvis package.
A collection of APIs to turn scripts and notebooks into interactive reports.
Modern, fast (high-performance), web framework for building APIs with Python
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.
Algorithms for monitoring and explaining machine learning models
Code for "High-Precision Model-Agnostic Explanations" paper
Bias and Fairness Audit Toolkit
Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University
Visual analysis and diagnostic tools to facilitate machine learning model selection.
An intuitive library to add plotting functionality to scikit-learn objects.
A game theoretic approach to explain the output of any machine learning model.
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Lime: Explaining the predictions of any machine learning classifier
python partial dependence plot toolbox
Python implementation of R package breakDown
⬛ Python Individual Conditional Expectation Plot Toolbox
Python Library for Model Interpretation/Explanations
Model analysis tools for TensorFlow
A library that implements fairness-aware machine learning algorithms
Interpretability and explainability of data and machine learning models
Auralisation of learned features in CNN (for audio)
🎆 A visualization of the CapsNet layers to better understand how it works
A collection of infrastructure and tools for research in neural network interpretability.
Visualizer for neural network, deep learning, and machine learning models
Exploration tool for your NeuralNetwork
tensorboard for pytorch (and chainer, mxnet, numpy, ...)
Logging MXNet data for visualization in TensorBoard.
A toolkit for developing and comparing reinforcement learning algorithms.
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
A toolkit for reproducible reinforcement learning research.
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
TF-Agents is a library for Reinforcement Learning in TensorFlow
Tensorforce: a TensorFlow library for applied reinforcement learning
TensorFlow Reinforcement Learning
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Deep Reinforcement Learning for Keras.
ChainerRL is a deep reinforcement learning library built on top of Chainer.
Fast, flexible and easy to use probabilistic modelling in Python.
Deep universal probabilistic programming with Python and PyTorch
THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:
Decorator for PyMC3
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
Bayesian dessert for Lasagne
Python package for Bayesian Machine Learning with scikit-learn API
Scikit-learn compatible estimation of general graphical models
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.
Supervised domain-agnostic prediction framework for probabilistic modelling
A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
Probabilistic Programming and Statistical Inference in PyTorch
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch
The Python ensemble sampling toolkit for affine-invariant MCMC
A library for hidden semi-Markov models with explicit durations
A highly efficient and modular implementation of Gaussian Processes in PyTorch
Modular Probabilistic Programming on MXNet
scikit-learn inspired API for CRFsuite
Python package for Bayesian statistical modeling and Probabilistic Machine Learning.
Genetic Programming in Python, with a scikit-learn inspired API
Distributed Evolutionary Algorithms in Python
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.
A strongly-typed genetic programming framework for Python
Genetic feature selection module for scikit-learn
Spearmint Bayesian optimization codebase
Bayesian optimization in PyTorch
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Sequential Model-based Algorithm Configuration
optimization routines for hyperparameter tuning
Distributed Asynchronous Hyperparameter Optimization in Python
Hyper-parameter optimization for sklearn
Use evolutionary algorithms instead of gridsearch in scikit-learn
SigOpt wrappers for scikit-learn methods
A Python implementation of global optimization with gaussian processes.
Safe Bayesian Optimization
Sequential model-based optimization with a
🎯 A comprehensive gradient-free optimization framework written in Python
A research toolkit for particle swarm optimization in Python
A Free and Open Source Python Library for Multiobjective Optimization
Bayesian Optimization using GPflow
POT : Python Optimal Transport
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
Natural Language Processing
The Classical Language Toolkit
scikit-learn wrappers for Python fastText.
Simple text to phones converter for multiple languages
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Python library for audio and music analysis
Audio features extraction
a library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals
A library for augmenting annotated audio data
Python audio and music signal processing library
Open Source Computer Vision Library
Image processing in Python
Image augmentation for machine learning experiments.
Image augmentation library in Python for machine learning.
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125
An extension to pandas dataframes describe function.
Create HTML profiling reports from pandas DataFrame objects
Statsmodels: statistical modeling and econometrics in Python
Supply a wrapper
StockDataFrame based on the
pandas.DataFrame with inline stock statistics/indicators support.
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
Multiple Pairwise Comparisons (Post Hoc) Tests in Python
Performance analysis of predictive (alpha) stock factors
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Distributed machine learning platform
Framework and Library for Distributed Online Machine Learning
Microsoft Distributed Machine Learning Toolkit
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Scalable Machine Learning with Dask
A distributed task scheduler for Dask
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
A visual dataflow programming language for sklearn
Adaptive Experimentation Platform
A library of metrics for evaluating recommender systems
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Parallel computing with task scheduling
Fast NumPy array functions written in C
A NumPy-compatible array library accelerated by CUDA
Python library for multilinear algebra and tensor factorizations
Solve automatic numerical differentiation problems in one or more variables.
Add built-in support for quaternions to numpy
Adaptive: parallel active learning of mathematical functions
Python tools for geographic data
PySAL: Python Spatial Analysis Library Meta-Package
PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
QML: Quantum Machine Learning
Open standard for machine learning interoperability
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.