Your first time on this page? Allow me to give some explanations.
Awesome Data Science
An awesome Data Science repository to learn and apply for real world problems.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you academic & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
What is Data Science?
A list of colleges and universities offering degrees in data science.
Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1
Official repo for the #tidytuesday project
Ways of doing Data Science Engineering and Machine Learning in R and Python
🐍 Quick reference guide to common patterns & functions in PySpark.
source code from the book Genetic Algorithms with Python by Clinton Sheppard
splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.
Roadmap to becoming an Artificial Intelligence Expert in 2021
Toolboxes - Environment
The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.
Template repository for data science lifecycle project
A general purpose recommender metrics library for fair evaluation.
A PyTorch based deep learning library for drug pair scoring.
PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Lightweight, Python library for fast and reproducible experimentation
Curated set of transformers that make your work with steppy faster and more effective
A GUI for Pandas DataFrames
Serverless proxy for Spark cluster
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
High performance distributed data processing engine
Intel® Deep Learning Framework
Julia kernel for Jupyter
An open source python library for automated feature engineering
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
🦉Data Version Control | Git for Data & Models | ML Experiments Management
Feature engineering and machine learning: together at last!
Feature Store for Machine Learning
Machine Learning Management & Orchestration Platform (Monorepo for Polyaxon's MLOps Tools)
ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
Hopsworks - Data-Intensive AI platform with a Feature Store
In-Database Machine Learning
Lightwood is Legos for Machine Learning.
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
♾️ CML - Continuous Machine Learning | CI/CD for ML
Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
Python Data Science Handbook: full text in Jupyter Notebooks
The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
ML powered analytics engine for outlier detection and root cause analysis.
Towhee is a framework that helps you encode your unstructured data into embeddings.
Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.
🏕️ Development environment for machine learning
Machine Learning in General Purpose
A scikit-learn based module for multi-label et. al. classification
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
open-source feature selection repository in python
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Sequence learning toolkit for Python
Python package for Bayesian Machine Learning with scikit-learn API
scikit-learn inspired API for CRFsuite
Use evolutionary algorithms instead of gridsearch in scikit-learn
SigOpt wrappers for scikit-learn methods
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
Image processing in Python
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Multiple Pairwise Comparisons (Post Hoc) Tests in Python
Simple structured learning framework for python
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
cuML - RAPIDS Machine Learning Library
Uplift modeling and causal inference with machine learning algorithms
mlpack: a scalable C++ machine learning library --
A library of extension and helper modules for Python's data analysis and machine learning libraries.
A modular active learning framework for Python
PySpark + Scikit-learn = Sparkit-learn
Waiting hours for a future prediction is unacceptable. Hyperlearn makes AI and ML algorithms 50% faster, use 90% less memory and doesn't require you to use new hardware! ML Algorithms like PCA, Linear Regression, NMF are all faster!
A toolkit for making real world machine learning and data analysis applications in C++
Python implementation of the rulefit algorithm
[HELP REQUESTED] Generalized Additive Models in Python
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort. See our docs: https://docs.deepchecks.com
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Datasets, Transforms and Models specific to Computer Vision
Data loaders and abstractions for text and NLP
Data manipulation and transformation for audio signal processing, powered by PyTorch
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
Simple tools for logging and visualizing, loading and training
A simplified framework and utilities for PyTorch
A scikit-learn compatible neural network library that wraps PyTorch
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch
Graph Neural Network Library for PyTorch
A highly efficient and modular implementation of Gaussian Processes in PyTorch
Deep universal probabilistic programming with Python and PyTorch
Accelerated deep learning R&D
A standard framework for modelling Deep Learning Models for tabular data
An Open Source Machine Learning Framework for Everyone
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
Deep learning library featuring a higher-level API for TensorFlow.
TensorFlow-based neural network library
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
TensorFlow Reinforcement Learning
Machine Learning Management & Orchestration Platform (Monorepo for Polyaxon's MLOps Tools)
NeuPy is a Tensorflow based python library for prototyping and building neural networks
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.
TensorFlow ROCm port
Deep learning with dynamic computation graphs in TensorFlow
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
TensorLight - A high-level framework for TensorFlow
Mesh TensorFlow: Model Parallelism Made Easier
Data-centric declarative deep learning framework
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Tensorforce: a TensorFlow library for applied reinforcement learning
Keras community contributions
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
Distributed Deep learning with Keras & Spark
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
Graph Neural Networks with Keras and Tensorflow 2.
QKeras: a quantization deep learning library for Tensorflow Keras
Deep Reinforcement Learning for Keras.
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Visualization Tools - Environments
Visualizer for neural network, deep learning, and machine learning models
Library for animated data visualizations and data stories.
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Journals, Publications and Magazines
The Leek group guide to data sharing
Exercise Solutions Authors: Garrett Grolemund and Hadley Wickham.