Your first time on this page? Allow me to give some explanations.
Awesome Data Science
Probably the best curated list of data science software in Python.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you krzjoa & contributors
View Topic on GitHub:
krzjoa/awesome-python-data-science
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
General Purpouse Machine Learning
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
cuML - RAPIDS Machine Learning Library
A modular active learning framework for Python
PySpark + Scikit-learn = Sparkit-learn
mlpack: a scalable C++ machine learning library --
A toolkit for making real world machine learning and data analysis applications in C++
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Waiting hours for a future prediction is unacceptable. Hyperlearn makes AI and ML algorithms 50% faster, use 90% less memory and doesn't require you to use new hardware! ML Algorithms like PCA, Linear Regression, NMF are all faster!
Machine Learning toolbox for Humans
A scikit-learn based module for multi-label et. al. classification
Sequence learning toolkit for Python
Simple structured learning framework for python
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
Python implementation of the rulefit algorithm
Metric learning algorithms in Python
[HELP REQUESTED] Generalized Additive Models in Python
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Uplift modeling and causal inference with machine learning algorithms
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort. See our docs: https://docs.deepchecks.com
Automated Machine Learning
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Automated Machine Learning with scikit-learn
MLBox is a powerful Automated Machine Learning python library.
Ensemble Methods
Stacked Generalization (Ensemble Learning)
Library for machine learning stacking generalization.
Python package for stacking (machine learning technique)
Imbalanced Datasets
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Python-based implementations of algorithms for learning on imbalanced data.
Random Forests
It is a forest of random projection trees
Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
Extreme Learning Machine
Extreme Learning Machine implementation in Python
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.
High performance implementation of Extreme Learning Machines (fast randomized neural networks).
Kernel Methods
Factorization machines in python
fastFM: A Library for Factorization Machines
TensorFlow implementation of an arbitrary order Factorization Machine
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-squares, quantile, and expectile regression.
Relevance Vector Machine implementation using the scikit-learn API.
ThunderSVM: A Fast SVM Library on GPUs and CPUs
Gradient Boosting
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
ThunderGBM: Fast GBDTs and Random Forests on GPUs
PyTorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Datasets, Transforms and Models specific to Computer Vision
Data loaders and abstractions for text and NLP
Data manipulation and transformation for audio signal processing, powered by PyTorch
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
A simplified framework and utilities for PyTorch
A scikit-learn compatible neural network library that wraps PyTorch
Simple tools for logging and visualizing, loading and training
Graph Neural Network Library for PyTorch
Accelerated deep learning R&D
PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
A PyTorch based deep learning library for drug pair scoring.
TensorFlow
An Open Source Machine Learning Framework for Everyone
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
Deep learning library featuring a higher-level API for TensorFlow.
TensorFlow-based neural network library
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
Machine Learning Management & Orchestration Platform (Monorepo for Polyaxon's MLOps Tools)
NeuPy is a Tensorflow based python library for prototyping and building neural networks
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.
TensorFlow ROCm port
Deep learning with dynamic computation graphs in TensorFlow
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
TensorLight - A high-level framework for TensorFlow
Mesh TensorFlow: Model Parallelism Made Easier
Data-centric declarative deep learning framework
Keras community contributions
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
Distributed Deep learning with Keras & Spark
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
Graph Neural Networks with Keras and Tensorflow 2.
QKeras: a quantization deep learning library for Tensorflow Keras
MXNet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
A clear, concise, simple yet powerful and efficient API for deep learning.
Simple, efficient and flexible vision toolbox for mxnet framework.
Gluon CV Toolkit
NLP made easy
Transfer Learning library for Deep Neural Networks.
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Others
Source-to-Source Debuggable Derivatives in Pure Python
Efficiently computes derivatives of numpy code.
Myia prototyping
Neural Network Libraries
Caffe: a fast open framework for deep learning.
hipCaffe: the HIP port of Caffe
Probably the best curated list of data science software in Python.
Web Scraping
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Scrape Twitter for Tweets
Data Containers
Create HTML profiling reports from pandas DataFrame objects
cuDF - GPU DataFrame Library
NumPy and Pandas interface to Big Data
sqldf for pandas
Google BigQuery connector for pandas
Universal 1d/2d data containers with Transformers functionality for data analysis.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
High performance datastore for time series and tick data
A Python package for manipulating 2-dimensional tabular data structures
Koalas: pandas API on Apache Spark
Modin: Speed up your Pandas workflows by changing a single line of code
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
The easy way to write your own flavor of Pandas
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Pipelines
Easy pipelines for pandas DataFrames.
functional data manipulation for pandas
dplyr for python
Pandas integration with sklearn
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Clean APIs for data cleaning. Python implementation of R package Janitor
A Python toolkit for processing tabular data
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Directions overlay for working with pandas in an analysis environment
General
An open source python library for automated feature engineering
scikit-learn addon to operate on set/"group"-based features
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API
a feature engineering wrapper for sklearn
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
Automatic extraction of relevant features from time series:
Feature Selection
open-source feature selection repository in python
Python implementations of the Boruta all-relevant feature selection method.
A fast xgboost feature selection algorithm
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
General Purposes
matplotlib: plotting with Python
Statistical data visualization in Python
Painlessly create beautiful matplotlib plots.
Ternary plotting library for python with matplotlib
Missing data visualization module for Python.
Python library that makes it easy for data scientists to create charts.
Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.
Interactive plots
A python package for animating plots build on matplotlib.
Interactive Data Visualization in the browser, from Python
Plotting library for IPython/Jupyter notebooks
🎨 Python Echarts Plotting Library
Map
A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets.
Automatic Plotting
With Holoviews, your data visualizes itself.
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Visualize and compare datasets, target values and associations, with one line of code.
NLP
Python library for interactive topic model visualization. Port of the R LDAvis package.
Deployment
Model Explanation
The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
Algorithms for explaining machine learning models
Code for "High-Precision Model-Agnostic Explanations" paper
Bias and Fairness Audit Toolkit
Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University
Visual analysis and diagnostic tools to facilitate machine learning model selection.
An intuitive library to add plotting functionality to scikit-learn objects.
A game theoretic approach to explain the output of any machine learning model.
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Lime: Explaining the predictions of any machine learning classifier
python partial dependence plot toolbox
Python implementation of R package breakDown
⬛ Python Individual Conditional Expectation Plot Toolbox
Python Library for Model Interpretation/Explanations
Model analysis tools for TensorFlow
A library that implements fairness-aware machine learning algorithms
Interpretability and explainability of data and machine learning models
Auralisation of learned features in CNN (for audio)
🎆 A visualization of the CapsNet layers to better understand how it works
A collection of infrastructure and tools for research in neural network interpretability.
Visualizer for neural network, deep learning, and machine learning models
tensorboard for pytorch (and chainer, mxnet, numpy, ...)
Logging MXNet data for visualization in TensorBoard.
Reinforcement Learning
A toolkit for developing and comparing reinforcement learning algorithms.
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
A toolkit for reproducible reinforcement learning research.
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Tensorforce: a TensorFlow library for applied reinforcement learning
TensorFlow Reinforcement Learning
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Deep Reinforcement Learning for Keras.
ChainerRL is a deep reinforcement learning library built on top of Chainer.
Probabilistic Methods
Deep universal probabilistic programming with Python and PyTorch
Fast, flexible and easy to use probabilistic modelling in Python.
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
Python package for Bayesian Machine Learning with scikit-learn API
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
Supervised domain-agnostic prediction framework for probabilistic modelling
Probabilistic Programming and Statistical Inference in PyTorch
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch
The Python ensemble sampling toolkit for affine-invariant MCMC
A library for hidden semi-Markov models with explicit durations
A highly efficient and modular implementation of Gaussian Processes in PyTorch
Modular Probabilistic Programming on MXNet
scikit-learn inspired API for CRFsuite
Genetic Programming
Genetic Programming in Python, with a scikit-learn inspired API
Distributed Evolutionary Algorithms in Python
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.
A strongly-typed genetic programming framework for Python
Genetic feature selection module for scikit-learn
Optimization
Spearmint Bayesian optimization codebase
Bayesian optimization in PyTorch
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Sequential Model-based Algorithm Configuration
optimization routines for hyperparameter tuning
Distributed Asynchronous Hyperparameter Optimization in Python
Hyper-parameter optimization for sklearn
Use evolutionary algorithms instead of gridsearch in scikit-learn
SigOpt wrappers for scikit-learn methods
A Python implementation of global optimization with gaussian processes.
Safe Bayesian Optimization
Sequential model-based optimization with a scipy.optimize
interface
🎯 A comprehensive gradient-free optimization framework written in Python
A research toolkit for particle swarm optimization in Python
A Free and Open Source Python Library for Multiobjective Optimization
Bayesian Optimization using GPflow
POT : Python Optimal Transport
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
Time Series
A unified framework for machine learning with time series
A python library for easy manipulation and forecasting of time series.
Lightning ⚡️ fast forecasting with statistical and econometric models.
Scalable machine learning based time series forecasting.
Scalable and user friendly neural forecasting algorithms for time series data .
A machine learning toolkit dedicated to time-series data
Module for statistical learning, with a particular emphasis on time-dependent modelling
A flexible, intuitive and fast forecasting library
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Open source time series library for Python
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
Anomaly Detection and Correlation library
Datetimes for Humans™
ML powered analytics engine for outlier detection and root cause analysis.
Natural Language Processing
NLTK Source
The Classical Language Toolkit
scikit-learn wrappers for Python fastText.
Simple text to phones converter for multiple languages
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Computer Audition
Python library for audio and music analysis
Audio features extraction
a library for audio and music analysis
C++ library for audio and music analysis, description and synthesis, including Python bindings
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals
A library for augmenting annotated audio data
Python audio and music signal processing library
Computer Vision
Open Source Computer Vision Library
Image processing in Python
Image augmentation for machine learning experiments.
Image augmentation library in Python for machine learning.
Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Statistics
A library for managing, validating, summarizing, and visualizing data.
Create HTML profiling reports from pandas DataFrame objects
Statsmodels: statistical modeling and econometrics in Python
Supply a wrapper StockDataFrame
based on the pandas.DataFrame
with inline stock statistics/indicators support.
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
Multiple Pairwise Comparisons (Post Hoc) Tests in Python
Performance analysis of predictive (alpha) stock factors
Distributed Computing
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Distributed machine learning platform
Framework and Library for Distributed Online Machine Learning
Microsoft Distributed Machine Learning Toolkit
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Scalable Machine Learning with Dask
A distributed task scheduler for Dask
Experimentation
🏕️ Development environment for machine learning
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
A visual dataflow programming language for sklearn
Adaptive Experimentation Platform
Evaluation
A library of metrics for evaluating recommender systems
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Computations
Parallel computing with task scheduling
Fast NumPy array functions written in C
NumPy & SciPy for GPU
Python library for multilinear algebra and tensor factorizations
Solve automatic numerical differentiation problems in one or more variables.
Add built-in support for quaternions to numpy
Adaptive: parallel active learning of mathematical functions
Spatial Analysis
Python tools for geographic data
PySAL: Python Spatial Analysis Library Meta-Package
Quantum Computing
PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
QML: Quantum Machine Learning
Conversion
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
Open standard for machine learning interoperability
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.