User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Data Science

An awesome Data Science repository to learn and apply for real world problems.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Dec. 1, 2020, 9:13 p.m.

Thank you academic & contributors
View Topic on GitHub:
academic/awesome-datascience

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

What is Data Science?

COLLEGES

MOOC's

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

3.65K
30.2K
4y 8m
n/a

Data Science Harvard University Assignments Lecture Notes Readings

Tutorials

Official repo for the #tidytuesday project

2.85K
1.07K
36d
CC0-1.0

Ways of doing Data Science Engineering and Machine Learning in R and Python

521
246
2y 7m
n/a

🐍 Quick reference guide to common patterns & functions in PySpark.

93
36
8m
MIT

source code from the book Genetic Algorithms with Python by Clinton Sheppard

757
343
4m
Apache-2.0

splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.

0
0
1d
BSD-3-Clause

Free Courses

Toolboxes - Environment

The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.

193
31
79d
MIT

Template repository for data science lifecycle project

46
15
5m
n/a

A Temporal Extension Library for PyTorch Geometric

266
26
15d
MIT

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

470
30
3d
GPL-3.0

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

1.06K
128
9d
GPL-3.0

🛠 All-in-one web-based IDE specialized for machine learning and data science.

1.54K
202
13d
Apache-2.0

Lightweight, Python library for fast and reproducible experimentation

123
31
2y 10d
MIT

Curated set of transformers that make your work with steppy faster and more effective

21
8
2y 11d
MIT

A GUI for Pandas DataFrames

1.75K
92
24d
MIT

Serverless proxy for Spark cluster

303
66
1y 57d
Apache-2.0

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

3.85K
839
1y 6m
Apache-2.0

High performance distributed data processing engine

385
51
1y 11m
Apache-2.0

Intel® Deep Learning Framework

317
90
4y 5m
n/a

Julia kernel for Jupyter

2.12K
348
3d
MIT

An open source python library for automated feature engineering

5.23K
679
9d
BSD-3-Clause

Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

958
191
13d
Apache-2.0

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

6.61K
858
3d
MIT

🦉Data Version Control | Git for Data & Models

6.79K
630
5d
Apache-2.0

Feature engineering and machine learning: together at last!

0
0
1y 10m
MIT

Feature Store for Machine Learning

1.21K
209
6d
Apache-2.0

Machine Learning Platform for Kubernetes

2.64K
254
5d
Apache-2.0

TRAINS - Auto-Magical Experiment Manager & Version Control for AI - NOW WITH AUTO-MAGICAL DEVOPS!

1.99K
232
3d
Apache-2.0

Hopsworks - Data-Intensive AI platform with a Feature Store

323
55
5d
n/a

Predictive AI layer for existing databases.

3.05K
398
9d
GPL-3.0

Lightwood is Legos for Machine Learning.

91
23
9d
GPL-3.0

Pandas on AWS

1.22K
203
6d
Apache-2.0

♾️ CML - Continuous Machine Learning | CI/CD for ML

2.01K
117
7d
Apache-2.0

Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.

8.07K
1.4K
61d
AGPL-3.0

Python Data Science Handbook: full text in Jupyter Notebooks

27.01K
11.88K
2y 3d
n/a

A lightweight ML experiment tracking, results visualization and management tool.

easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.

is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.

The R Project for Statistical Computing.

IDE – powerful user interface for R. It’s free and open source, works onWindows, Mac, and Linux.

Machine learning in Python. sklearn

A fundamental package for scientific computing with Python.

A Python-based ecosystem of open-source software for mathematics, science, and engineering.

Take numerical, textual, image, GIS or other data and give it the Wolfram treatment, carrying out a full spectrum of data science analysis and visualization and automatically generating rich interactive reports—all powered by the revolutionary knowledge-based Wolfram Language.

heavy_dollar_sign: - Datadog is a full-stack monitoring service for large-scale cloud environments that aggregates metrics/events from servers, databases, and applications. It includes support for Docker, Kubernetes, and Mesos.

Build powerful data visualizations for the web without writing JavaScript

The Kite Software Development Kit (Apache License, Version 2.0), or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.

Run, scale, share, and deploy your models — without any infrastructure or setup.

A platform for efficient, distributed, general-purpose data processing.

Apache Hama is an Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce.

Weka is a collection of machine learning algorithms for data mining tasks.

GNU Octave is a high-level interpreted language, primarily intended for numerical computations.(Free Matlab)

Lightning-fast cluster computing

Deep Learning Framework

Scientific computing framework with wide support for machine learning algorithms, used by Facebook, Google, and more.

A machine learning package built for humans.

An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com

TensorFlow is an Open Source Software Library for Machine Intelligence

A leading platform for building Python programs to work with human language data.

high-level, high-performance dynamic programming language for technical computing

Web-based notebook that enables data-driven,

Text Annotation Tool for teams

A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism. Very interesting, but only needed when you're getting advanced.

Topic Modelling for Humans.

A library for industrial-strength natural language processing in Python and Cython.

Machine Learning in General Purpose

A scikit-learn based module for multi-label et. al. classification

604
118
1y 6m
BSD-2-Clause

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

448
69
3y 114d
n/a

open-source feature selection repository in python

1.01K
337
1y 25d
GPL-2.0

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

299
54
4m
MIT

Sequence learning toolkit for Python

574
97
4y 9m
MIT

Python package for Bayesian Machine Learning with scikit-learn API

419
107
11m
MIT

scikit-learn inspired API for CRFsuite

347
151
12m
n/a

Use evolutionary algorithms instead of gridsearch in scikit-learn

613
108
1y 0d
MIT

SigOpt wrappers for scikit-learn methods

69
12
8m
MIT

scikit-learn model evaluation made easy: plots, tables and markdown reports.

285
25
61d
MIT

Image processing in Python

4.07K
1.69K
5d
n/a

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

1.54K
377
12d
MIT

Pairwise Multiple Comparisons (Post Hoc) Tests in Python

166
17
44d
MIT

Simple structured learning framework for python

628
167
2y 62d
BSD-2-Clause

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

2.8K
497
9m
Apache-2.0

cuML - RAPIDS Machine Learning Library

1.8K
293
8d
Apache-2.0

Uplift modeling and causal inference with machine learning algorithms

1.46K
219
13d
n/a

mlpack: a scalable C++ machine learning library --

3.44K
1.27K
3d
n/a

A library of extension and helper modules for Python's data analysis and machine learning libraries.

3.23K
685
6d
n/a

A modular active learning framework for Python

948
158
31d
MIT

PySpark + Scikit-learn = Sparkit-learn

1.05K
236
3y 40d
Apache-2.0

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

1.19K
109
75d
BSD-3-Clause

A toolkit for making real world machine learning and data analysis applications in C++

9.65K
2.83K
3d
BSL-1.0

Python implementation of the rulefit algorithm

211
68
16d
MIT

[HELP REQUESTED] Generalized Additive Models in Python

540
101
4m
Apache-2.0

The most popular Python library for Machine Learning.

Machine learning toolbox.

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

44.32K
11.7K
2d
n/a

Datasets, Transforms and Models specific to Computer Vision

7.82K
4.04K
2d
BSD-3-Clause

Data loaders and abstractions for text and NLP

2.57K
581
12d
BSD-3-Clause

Data manipulation and transformation for audio signal processing, powered by PyTorch

1.14K
259
12d
BSD-2-Clause

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

3.16K
413
2d
BSD-3-Clause

Simple tools for logging and visualizing, loading and training

1.25K
182
10m
BSD-3-Clause

A simplified framework and utilities for PyTorch

402
47
18d
LGPL-3.0

A scikit-learn compatible neural network library that wraps PyTorch

3.63K
269
31d
BSD-3-Clause

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

311
43
1y 11m
MIT

Geometric Deep Learning Extension Library for PyTorch

9.54K
1.62K
5d
MIT

A highly efficient and modular implementation of Gaussian Processes in PyTorch

2.19K
307
4d
MIT

Deep universal probabilistic programming with Python and PyTorch

6.61K
793
3d
Apache-2.0

Accelerated deep learning R&D

2.31K
260
5d
Apache-2.0

tensorflow

An Open Source Machine Learning Framework for Everyone

150.96K
83.36K
2d
Apache-2.0

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

6.38K
1.43K
34d
n/a

Deep learning library featuring a higher-level API for TensorFlow.

9.47K
2.43K
2d
n/a

TensorFlow-based neural network library

8.62K
1.25K
55d
Apache-2.0

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

5.81K
1.74K
31d
Apache-2.0

TensorFlow Reinforcement Learning

3.04K
371
7m
Apache-2.0

Machine Learning Platform for Kubernetes

2.64K
254
5d
Apache-2.0

NeuPy is a Tensorflow based python library for prototyping and building neural networks

652
145
1y 92d
MIT

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

347
39
3y 8m
MIT

TensorFlow ROCm port

518
64
8d
Apache-2.0

Deep learning with dynamic computation graphs in TensorFlow

1.79K
278
3y 33d
Apache-2.0

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

62
30
2y 7m
MIT

TensorLight - A high-level framework for TensorFlow

9
2
3y 7m
MIT

Mesh TensorFlow: Model Parallelism Made Easier

587
110
3d
Apache-2.0

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

7.3K
882
3d
Apache-2.0

TF-Agents is a library for Reinforcement Learning in TensorFlow

1.7K
438
2d
Apache-2.0

Tensorforce: a TensorFlow library for applied reinforcement learning

2.81K
483
3d
Apache-2.0

keras

Keras community contributions

1.46K
600
11m
MIT

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

2.05K
297
61d
MIT

Distributed Deep learning with Keras & Spark

1.43K
283
61d
MIT

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

497
50
3y 6m
MIT

Graph Neural Networks with Keras and Tensorflow 2.

1.51K
167
5d
MIT

QKeras: a quantization deep learning library for Tensorflow Keras

228
39
7d
Apache-2.0

Deep Reinforcement Learning for Keras.

4.87K
1.29K
1y 22d
MIT

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

1.34K
224
11d
MIT

A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Keras compatible

Visualization Tools - Environments

Debugging, monitoring and visualization for Python Machine Learning and Data Science

2.98K
316
77d
MIT

Three libraries for traditional charts, stock, and maps. Features a hand-drawn style theme option.

Set of products for charting different types of data. Has a special Oracle Apex integration option.

Allows the user to manipulate documents based on data to render charts in SVG.

A data visualization package based on the grammar of graphics.

A series of charting libraries for a variety of uses. Can be compatible back to IE6.

A Python 2D plotting library.

list of open source data visualization tools

A python visualization library based on matplotlib.

A high-productivity software for complex networks.

C3

D3-based reusable chart library

Journals, Publications and Magazines

Presentations

Newsletters

A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.

Podcasts

Books

free e-book comprehended by an online course

Bloggers

Greg Reda Personal Blog

Kevin Davenport Personal Blog

Recurse Center alumna

Tech Blog on Master Data Management And Every Buzz Surrounding It

The Open Source Data Science Masters

Based in the UK and working globally, Cloud of Data's consultancy services help clients understand the implications of taking data and more to the Cloud.

Data Science London is a non-profit organization dedicated to the free, open, dissemination of data science.

by Peter Skomoroch. MACHINE LEARNING, DATA MINING, AND MORE

Data Science Questions and Answers from experts

a PhD student at Berkeley

MDS, Inc. Helps Build Careers in Data Science, Advanced Analytics, Big Data Architecture, and High Performance Software Engineering

a technology guy with a penchant for the web and for data, big and small

about helping professional programmers to confidently apply machine learning algorithms to address complex problems.

data-driven consulting and design

a data scientist at Twitch. I handle the whole data pipeline, from tracking to model-building to reporting.

Data Mining, Analytics, Big Data, Data, Science not a blog a portal

is some of, all of, or much more than the above and this blog explores its impact on information technology, the business world, government agencies, and our lives.

How a Social Scientist Jumps into the World of Big Data

Thoughts on Statistical Computing and Visualization

Learning To Be A Data Scientist

Musings on data science, machine learning and stats.

The File Drawer](http://chris-said.io/) - Chris Said's science blog

Visualization and Statistics

A Machine Learning Craftsmanship Blog

Handbook and recipes for data-driven solutions of real-world problems

A blog on the new emerging data economy

A blog with resources for data science learners

A full-fledged website about data science and analytics study material.

Data science tutorials for beginners!

Blog for understanding Neural Networks!

Blog for NLP and transfer learning!

Dedicated to clear explanations of machine learning!

Data Science with Esoteric programming languages

Newsletters

A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.

Podcasts

Books

free e-book comprehended by an online course

Facebook Accounts

Twitter Accounts

Rapid-fire, live tryouts for data scientists seeking to monetize their models as trading strategies

Data Viz Wiz | Data Journalist | Growth Hacker | Author of Data Science for Dummies (2015)

Big Data, Data Science, Predictive Modeling, Business Analytics, Hadoop, Decision and Operations Research.

Director of Data Science at @ExploreAltamira

Data scientist at Twitter

Dev, Design, Data Science @mattermark #hackerei

datascientist @Ekimetrics. , #machinelearning #dataviz #DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast

Data Science Central is the industry's single resource for Big Data practitioners.

Data Science. Big Data. Data Hacks. Data Junkies. Data Startups. Open Data

Documenting my path from SQL Data Analyst pursuing an Engineering Master's Degree to Data Scientist

Mission is to help guide & advance careers in Data Science & Analytics

Tips and Tricks for Data Scientists around the world! #datascience #bigdata

White House Data Chief, VP @ RelateIQ.

Data nerd, hacker, student of conflict.

Networks, #MachineLearning and #DataScience. I work on #Social Media. Postdoc at @IndianaUniv

Running with #BigData--enjoying a love/hate relationship with its hype. @iSchoolSU #DataScience Program Mgr.

Working @ GrubHub about data and pandas

KDnuggets President, Analytics/Big Data/Data Mining/Data Science expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, part-time philosopher.

Data Scientist in Residence at @accel.

ReTweeting about data science

Scientist at Facebook and Julia developer. Author of Machine Learning for Hackers and Bandit Algorithms for Website Optimization. Tweets reflect my views only.

Principal Data Scientist @ Microsoft Data Science Team

Hacker - Pandas - Data Analyze

The Economist's Data Editor and co-author of Big Data (http://big-data-book.com ).

Organizer of https://meetup.com/San-Diego-R-Users-Group/

Data science instructor, and founder of Data School

Interactive data visualization and tools. Data flaneur.

DataScientist, PhD Astrophysicist, Top #BigData Influencer.

Data story teller, visualizations.

PhD Student. Programming, Mobile, Web. Artificial Intelligence, Intelligent Robotics Machine Learning, Data Mining, Natural Language Processing, Data Science.

Data Analytics Recruitment Specialist at Salt (@SaltJobs) | Analytics - Insight - Big Data - Datascience

Opinions of full-stack Python guy, author, instructor, currently playing Data Scientist. Occasional fathering, husbanding, ult|goalt-imate, organic gardening.

Data Scientist at BizQualify, Developer

Data @ Jawbone. Turned data into stories & products at LinkedIn. Text mining, applied machine learning, recommender systems. Ex-gamer, ex-machine coder; namer.

Visualization & interaction designer. Practical cyclist. Author of vis books: http://www.oreilly.com/pub/au/4419

Cloud Computing/ Big Data/ Open Data Analyst & Consultant. Writer, Speaker & Moderator. Gigaom Research Analyst.

Creating intelligent systems to automate tasks & improve decisions. Entrepreneur, ex Principal Data Scientist @LinkedIn. Machine Learning, ProductRei, Networks

Solution Architect @ IBM, Master Data Management, Data Quality & Data Governance Blogger. Data Science, Hadoop, Big Data & Cloud.

Tweet blog posts from the R blogosphere, data science conferences and (!) open jobs for data scientists.

Computer scientist researching artificial intelligence. Data tinkerer. Community leader for @DataIsBeautiful. #OpenScience advocate.

Data Science geek @ UALR

Data scientist, genetic origamist, hardware aficionado

Social Scientist. Hacker. Facebook Data Science Team. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.

Data Scientist at BBVA Compass

Enjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile kaggler/data scientist

Complex Event Processing, Big Data, Artificial Intelligence and Machine Learning. Passionate about programming and open-source.

InfoGov; Bigdata; Data as a Service; Data Science; Open, Social & Business Data Convergence

IT analyst with Ovum covering Big Data & data management with some systems engineering thrown in.

Data Scientist | Author | Entrepreneur. Co-founder @DataCommunityDC. Founder @DistrictDataLab. #DataScience #BigData #DataDC

Data Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon alumni (Blog: https://allthingsds.wordpress.com )

Pandas (Python Data Analysis library).

Senior Manager - @Seagate Big Data Analytics | @McKinsey Alum | #BigData + #Analytics Evangelist | #Hadoop, #Cloud, #Digital, & #R Enthusiast

The data news crew at @WNYC. Practicing data-driven journalism, making it visual and showing our work.

Youtube Videos & Channels

Telegram Channels

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former.

Beautiful posts on DS/ML theme with video or graphic vizualization.

Competitions

Infographic

Data Sets

Open Data Sources

410
180