User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Data Science

An awesome Data Science repository to learn and apply for real world problems.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Aug. 7, 2022, 6:13 p.m.

Thank you academic & contributors
View Topic on GitHub:
academic/awesome-datascience

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

What is Data Science?

COLLEGES

Intensive Programs

MOOC's

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

3.83K
31.09K
1y 4m
n/a

Tutorials

Official repo for the #tidytuesday project

4.46K
1.79K
5m
CC0-1.0

Ways of doing Data Science Engineering and Machine Learning in R and Python

547
252
1y 105d
n/a

🐍 Quick reference guide to common patterns & functions in PySpark.

148
51
1y 73d
MIT

source code from the book Genetic Algorithms with Python by Clinton Sheppard

880
374
2y 32d
Apache-2.0

splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.

9
1
11m
BSD-3-Clause

Free Courses

Toolboxes - Environment

The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.

290
47
1y 96d
MIT

Template repository for data science lifecycle project

89
26
2y 37d
n/a

A general purpose recommender metrics library for fair evaluation.

85
7
7m
n/a

A PyTorch based deep learning library for drug pair scoring.

33
2
7m
Apache-2.0

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)

1.33K
194
5m
MIT

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

581
40
6m
GPL-3.0

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

1.51K
184
6m
GPL-3.0

🛠 All-in-one web-based IDE specialized for machine learning and data science.

2.26K
309
10m
Apache-2.0

Lightweight, Python library for fast and reproducible experimentation

132
33
3y 8m
MIT

Curated set of transformers that make your work with steppy faster and more effective

21
8
3y 8m
MIT

A GUI for Pandas DataFrames

2.56K
168
6m
MIT-0

Serverless proxy for Spark cluster

316
68
1y 9m
Apache-2.0

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

3.85K
832
1y 7m
Apache-2.0

High performance distributed data processing engine

397
55
1y 71d
Apache-2.0

Intel® Deep Learning Framework

313
90
6y 54d
n/a

Julia kernel for Jupyter

2.34K
374
9m
MIT

An open source python library for automated feature engineering

5.99K
786
5m
BSD-3-Clause

Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

1.18K
212
5m
Apache-2.0

Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125

9.68K
1.24K
5m
MIT

🦉Data Version Control | Git for Data & Models | ML Experiments Management

8.81K
850
9m
Apache-2.0

Feature engineering and machine learning: together at last!

9
0
1y 7m
MIT

Feature Store for Machine Learning

2.42K
420
9m
Apache-2.0

Machine Learning Management & Orchestration Platform (Monorepo for Polyaxon's MLOps Tools)

3.01K
301
5m
Apache-2.0

ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

3K
411
5m
Apache-2.0

Hopsworks - Data-Intensive AI platform with a Feature Store

603
92
9m
n/a

In-Database Machine Learning

5.57K
619
5m
GPL-3.0

Lightwood is Legos for Machine Learning.

183
46
9m
GPL-3.0

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

2.54K
420
5m
Apache-2.0

♾️ CML - Continuous Machine Learning | CI/CD for ML

2.77K
197
9m
Apache-2.0

Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.

8.69K
1.5K
1y 10m
AGPL-3.0

Python Data Science Handbook: full text in Jupyter Notebooks

32.59K
14.62K
3y 8m
n/a

The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).

136
17
6m
MIT

ML powered analytics engine for outlier detection and root cause analysis.

229
20
5m
MIT

Towhee is a framework that helps you encode your unstructured data into embeddings.

383
82
116d
Apache-2.0

Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.

284
11
47d
Apache-2.0

🏕️ Development environment for machine learning

496
28
47d
Apache-2.0

Machine Learning in General Purpose

A scikit-learn based module for multi-label et. al. classification

721
142
3y 80d
BSD-2-Clause

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

480
66
4y 12m
n/a

open-source feature selection repository in python

1.17K
404
2y 9m
GPL-2.0

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

343
65
1y 5m
MIT

Sequence learning toolkit for Python

628
100
6y 5m
MIT

Python package for Bayesian Machine Learning with scikit-learn API

444
108
10m
MIT

scikit-learn inspired API for CRFsuite

390
178
2y 8m
n/a

Use evolutionary algorithms instead of gridsearch in scikit-learn

676
114
1y 9d
MIT

SigOpt wrappers for scikit-learn methods

70
14
1y 2d
MIT

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

320
29
5m
MIT

Image processing in Python

4.77K
1.96K
5m
n/a

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

2.91K
700
6m
MIT

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

232
24
8m
MIT

Simple structured learning framework for python

657
173
10m
BSD-2-Clause

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

2.98K
537
2y 5m
Apache-2.0

cuML - RAPIDS Machine Learning Library

2.57K
379
5m
Apache-2.0

Uplift modeling and causal inference with machine learning algorithms

2.78K
428
5m
n/a

mlpack: a scalable C++ machine learning library --

3.91K
1.41K
6m
n/a

A library of extension and helper modules for Python's data analysis and machine learning libraries.

3.8K
748
6m
n/a

A modular active learning framework for Python

1.46K
228
9m
MIT

PySpark + Scikit-learn = Sparkit-learn

1.12K
248
4y 9m
Apache-2.0

Waiting hours for a future prediction is unacceptable. Hyperlearn makes AI and ML algorithms 50% faster, use 90% less memory and doesn't require you to use new hardware! ML Algorithms like PCA, Linear Regression, NMF are all faster!

1.23K
112
8m
BSD-3-Clause

A toolkit for making real world machine learning and data analysis applications in C++

10.91K
3.03K
5m
BSL-1.0

Python implementation of the rulefit algorithm

279
78
1y 42d
MIT

[HELP REQUESTED] Generalized Additive Models in Python

667
120
2y 24d
Apache-2.0

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort. See our docs: https://docs.deepchecks.com

621
39
7m
n/a

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

53.96K
14.92K
5m
n/a

Datasets, Transforms and Models specific to Computer Vision

10.85K
5.61K
5m
BSD-3-Clause

Data loaders and abstractions for text and NLP

2.95K
691
5m
BSD-3-Clause

Data manipulation and transformation for audio signal processing, powered by PyTorch

1.59K
384
5m
BSD-2-Clause

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

3.86K
523
5m
BSD-3-Clause

Simple tools for logging and visualizing, loading and training

1.36K
195
1y 7m
BSD-3-Clause

A simplified framework and utilities for PyTorch

509
61
5m
LGPL-3.0

A scikit-learn compatible neural network library that wraps PyTorch

4.2K
297
9m
BSD-3-Clause

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

332
46
3y 7m
MIT

Graph Neural Network Library for PyTorch

13.81K
2.43K
5m
MIT

A highly efficient and modular implementation of Gaussian Processes in PyTorch

2.67K
409
6m
MIT

Deep universal probabilistic programming with Python and PyTorch

7.31K
894
6m
Apache-2.0

Accelerated deep learning R&D

2.84K
353
5m
Apache-2.0

A standard framework for modelling Deep Learning Models for tabular data

453
42
9m
MIT

tensorflow

An Open Source Machine Learning Framework for Everyone

166.87K
87.03K
4d
Apache-2.0

Deep Learning and Reinforcement Learning Library for Scientists and Engineers

6.87K
1.53K
5m
n/a

Deep learning library featuring a higher-level API for TensorFlow.

9.58K
2.43K
1y 8m
n/a

TensorFlow-based neural network library

9.19K
1.31K
6m
Apache-2.0

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

6.17K
1.83K
6m
Apache-2.0

TensorFlow Reinforcement Learning

3.11K
379
12m
Apache-2.0

Machine Learning Management & Orchestration Platform (Monorepo for Polyaxon's MLOps Tools)

3.01K
301
5m
Apache-2.0

NeuPy is a Tensorflow based python library for prototyping and building neural networks

702
153
2y 11m
MIT

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

346
39
1y 7m
BSD-3-Clause

TensorFlow ROCm port

581
67
5m
Apache-2.0

Deep learning with dynamic computation graphs in TensorFlow

1.82K
279
4y 9m
Apache-2.0

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

63
30
4y 108d
MIT

TensorLight - A high-level framework for TensorFlow

10
4
5y 96d
MIT

Mesh TensorFlow: Model Parallelism Made Easier

1.2K
209
6m
Apache-2.0

Data-centric declarative deep learning framework

8.12K
966
5m
Apache-2.0

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

2.18K
598
5m
Apache-2.0

Tensorforce: a TensorFlow library for applied reinforcement learning

3.04K
513
9m
Apache-2.0

keras

Keras community contributions

1.54K
655
2y 7m
MIT

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

2.11K
307
8m
MIT

Distributed Deep learning with Keras & Spark

1.53K
296
11m
MIT

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

497
49
5y 77d
MIT

Graph Neural Networks with Keras and Tensorflow 2.

1.99K
270
5m
MIT

QKeras: a quantization deep learning library for Tensorflow Keras

361
72
6m
Apache-2.0

Deep Reinforcement Learning for Keras.

5.2K
1.33K
2y 9m
MIT

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

1.49K
251
6m
MIT

Visualization Tools - Environments

Visualizer for neural network, deep learning, and machine learning models

19.53K
2.24K
4d
MIT

Library for animated data visualizations and data stories.

604
12
9m
Apache-2.0

Debugging, monitoring and visualization for Python Machine Learning and Data Science

3.18K
346
1y 117d
MIT

Journals, Publications and Magazines

Presentations

Podcasts

Books

Exercise Solutions Authors: Garrett Grolemund and Hadley Wickham.

Bloggers

Facebook Accounts

Twitter Accounts

Newsletters

Youtube Videos & Channels