User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Data Science

An awesome Data Science repository to learn and apply for real world problems.

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: None

Thank you academic & contributors
View Topic on GitHub:
academic/awesome-datascience

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

What is Data Science?

COLLEGES

MOOC's

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

3.71K
30.47K
5y 61d
n/a

Data Science Harvard University Assignments Lecture Notes Readings

Tutorials

Official repo for the #tidytuesday project

2.85K
1.07K
6m
CC0-1.0

Ways of doing Data Science Engineering and Machine Learning in R and Python

526
248
3y 20d
n/a

🐍 Quick reference guide to common patterns & functions in PySpark.

105
37
108d
MIT

source code from the book Genetic Algorithms with Python by Clinton Sheppard

797
351
10m
Apache-2.0

splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.

0
0
5m
BSD-3-Clause

Free Courses

Toolboxes - Environment

The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.

212
35
8m
MIT

Template repository for data science lifecycle project

59
19
10m
n/a

A Temporal Extension Library for PyTorch Geometric

349
40
87d
MIT

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

495
32
93d
GPL-3.0

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

1.16K
140
110d
GPL-3.0

🛠 All-in-one web-based IDE specialized for machine learning and data science.

1.74K
232
4m
Apache-2.0

Lightweight, Python library for fast and reproducible experimentation

126
31
2y 5m
MIT

Curated set of transformers that make your work with steppy faster and more effective

21
8
2y 5m
MIT

A GUI for Pandas DataFrames

1.75K
92
6m
MIT

Serverless proxy for Spark cluster

305
69
1y 7m
Apache-2.0

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

3.85K
840
1y 11m
Apache-2.0

High performance distributed data processing engine

390
53
2y 4m
Apache-2.0

Intel® Deep Learning Framework

315
90
4y 11m
n/a

Julia kernel for Jupyter

2.19K
356
91d
MIT

An open source python library for automated feature engineering

5.4K
708
86d
BSD-3-Clause

Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

978
194
81d
Apache-2.0

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

7.35K
960
88d
MIT

🦉Data Version Control | Git for Data & Models

7.31K
695
86d
Apache-2.0

Feature engineering and machine learning: together at last!

0
0
2y 4m
MIT

Feature Store for Machine Learning

1.45K
252
87d
Apache-2.0

Machine Learning Platform for Kubernetes

2.73K
265
82d
Apache-2.0

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management

2.18K
328
90d
Apache-2.0

Hopsworks - Data-Intensive AI platform with a Feature Store

392
61
93d
n/a

Predictive AI layer for existing databases.

3.23K
413
86d
GPL-3.0

Lightwood is Legos for Machine Learning.

104
24
89d
GPL-3.0

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

1.5K
236
79d
Apache-2.0

♾️ CML - Continuous Machine Learning | CI/CD for ML

2.17K
134
86d
Apache-2.0

Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.

8.07K
1.4K
7m
AGPL-3.0

Python Data Science Handbook: full text in Jupyter Notebooks

28.15K
12.51K
2y 5m
n/a

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

15
2
4m
MIT

A lightweight ML experiment tracking, results visualization and management tool.

easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.

is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.

The R Project for Statistical Computing.

IDE – powerful user interface for R. It’s free and open source, works onWindows, Mac, and Linux.

Machine learning in Python. sklearn

A fundamental package for scientific computing with Python.

A Python-based ecosystem of open-source software for mathematics, science, and engineering.

Take numerical, textual, image, GIS or other data and give it the Wolfram treatment, carrying out a full spectrum of data science analysis and visualization and automatically generating rich interactive reports—all powered by the revolutionary knowledge-based Wolfram Language.

heavy_dollar_sign: - Datadog is a full-stack monitoring service for large-scale cloud environments that aggregates metrics/events from servers, databases, and applications. It includes support for Docker, Kubernetes, and Mesos.

Build powerful data visualizations for the web without writing JavaScript

The Kite Software Development Kit (Apache License, Version 2.0), or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.

Run, scale, share, and deploy your models — without any infrastructure or setup.

A platform for efficient, distributed, general-purpose data processing.

Apache Hama is an Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce.

Weka is a collection of machine learning algorithms for data mining tasks.

GNU Octave is a high-level interpreted language, primarily intended for numerical computations.(Free Matlab)

Lightning-fast cluster computing

Deep Learning Framework

Scientific computing framework with wide support for machine learning algorithms, used by Facebook, Google, and more.

A machine learning package built for humans.

An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com

TensorFlow is an Open Source Software Library for Machine Intelligence

A leading platform for building Python programs to work with human language data.

high-level, high-performance dynamic programming language for technical computing

Web-based notebook that enables data-driven,

Text Annotation Tool for teams

A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism. Very interesting, but only needed when you're getting advanced.

Topic Modelling for Humans.

A library for industrial-strength natural language processing in Python and Cython.

Machine Learning in General Purpose

A scikit-learn based module for multi-label et. al. classification

632
122
1y 12m
BSD-2-Clause

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

452
67
3y 9m
n/a

open-source feature selection repository in python

1.05K
350
1y 6m
GPL-2.0

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

309
58
89d
MIT

Sequence learning toolkit for Python

586
98
5y 90d
MIT

Python package for Bayesian Machine Learning with scikit-learn API

420
108
1y 4m
MIT

scikit-learn inspired API for CRFsuite

360
154
1y 5m
n/a

Use evolutionary algorithms instead of gridsearch in scikit-learn

628
112
1y 5m
MIT

SigOpt wrappers for scikit-learn methods

69
12
1y 44d
MIT

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

286
26
4m
MIT

Image processing in Python

4.2K
1.76K
82d
n/a

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

1.98K
463
102d
MIT

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

178
19
86d
MIT

Simple structured learning framework for python

630
169
2y 7m
BSD-2-Clause

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

2.84K
510
1y 73d
Apache-2.0

cuML - RAPIDS Machine Learning Library

1.95K
313
82d
Apache-2.0

Uplift modeling and causal inference with machine learning algorithms

1.68K
263
85d
n/a

mlpack: a scalable C++ machine learning library --

3.56K
1.32K
83d
n/a

A library of extension and helper modules for Python's data analysis and machine learning libraries.

3.35K
695
82d
n/a

A modular active learning framework for Python

1.1K
183
4m
MIT

PySpark + Scikit-learn = Sparkit-learn

1.06K
239
3y 6m
Apache-2.0

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

1.2K
108
7m
BSD-3-Clause

A toolkit for making real world machine learning and data analysis applications in C++

9.91K
2.88K
84d
BSL-1.0

Python implementation of the rulefit algorithm

219
67
6m
MIT

[HELP REQUESTED] Generalized Additive Models in Python

561
105
10m
Apache-2.0

The most popular Python library for Machine Learning.

Machine learning toolbox.

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

46.37K
12.33K
81d
n/a

Datasets, Transforms and Models specific to Computer Vision

8.42K
4.35K
81d
BSD-3-Clause

Data loaders and abstractions for text and NLP

2.65K
614
81d
BSD-3-Clause

Data manipulation and transformation for audio signal processing, powered by PyTorch

1.23K
280
82d
BSD-2-Clause

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

3.27K
438
81d
BSD-3-Clause

Simple tools for logging and visualizing, loading and training

1.28K
190
4m
BSD-3-Clause

A simplified framework and utilities for PyTorch

452
49
93d
LGPL-3.0

A scikit-learn compatible neural network library that wraps PyTorch

3.79K
281
97d
BSD-3-Clause

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

315
46
2y 5m
MIT

Geometric Deep Learning Extension Library for PyTorch

10.27K
1.76K
81d
MIT

A highly efficient and modular implementation of Gaussian Processes in PyTorch

2.3K
328
82d
MIT

Deep universal probabilistic programming with Python and PyTorch

6.74K
826
82d
Apache-2.0

Accelerated deep learning R&D

2.44K
280
114d
Apache-2.0

tensorflow

An Open Source Machine Learning Framework for Everyone

153.46K
84.06K
81d
Apache-2.0

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

6.5K
1.46K
82d
n/a

Deep learning library featuring a higher-level API for TensorFlow.

9.52K
2.43K
5m
n/a

TensorFlow-based neural network library

8.77K
1.26K
94d
Apache-2.0

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

5.93K
1.77K
87d
Apache-2.0

TensorFlow Reinforcement Learning

3.06K
371
1y 25d
Apache-2.0

Machine Learning Platform for Kubernetes

2.73K
265
82d
Apache-2.0

NeuPy is a Tensorflow based python library for prototyping and building neural networks

665
148
1y 8m
MIT

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

346
39
4m
BSD-3-Clause

TensorFlow ROCm port

543
65
82d
Apache-2.0

Deep learning with dynamic computation graphs in TensorFlow

1.8K
279
3y 6m
Apache-2.0

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

62
30
3y 23d
MIT

TensorLight - A high-level framework for TensorFlow

9
2
4y 11d
MIT

Mesh TensorFlow: Model Parallelism Made Easier

873
156
99d
Apache-2.0

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

7.57K
889
83d
Apache-2.0

TF-Agents is a library for Reinforcement Learning in TensorFlow

1.8K
471
81d
Apache-2.0

Tensorforce: a TensorFlow library for applied reinforcement learning

2.88K
489
96d
Apache-2.0

keras

Keras community contributions

1.49K
615
1y 4m
MIT

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

2.08K
301
4m
MIT

Distributed Deep learning with Keras & Spark

1.45K
288
91d
MIT

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

496
51
3y 11m
MIT

Graph Neural Networks with Keras and Tensorflow 2.

1.63K
198
88d
MIT

QKeras: a quantization deep learning library for Tensorflow Keras

246
50
84d
Apache-2.0

Deep Reinforcement Learning for Keras.

4.96K
1.3K
1y 6m
MIT

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

1.38K
229
5m
MIT

A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Keras compatible

Visualization Tools - Environments

Debugging, monitoring and visualization for Python Machine Learning and Data Science

3.03K
326
4m
MIT

Three libraries for traditional charts, stock, and maps. Features a hand-drawn style theme option.

Set of products for charting different types of data. Has a special Oracle Apex integration option.

Allows the user to manipulate documents based on data to render charts in SVG.

A data visualization package based on the grammar of graphics.

A series of charting libraries for a variety of uses. Can be compatible back to IE6.

A Python 2D plotting library.

list of open source data visualization tools

A python visualization library based on matplotlib.

A high-productivity software for complex networks.

C3

D3-based reusable chart library

Journals, Publications and Magazines

Presentations

Podcasts

Books

free e-book comprehended by an online course

Bloggers

Greg Reda Personal Blog

Kevin Davenport Personal Blog

Recurse Center alumna

Tech Blog on Master Data Management And Every Buzz Surrounding It

The Open Source Data Science Masters

Based in the UK and working globally, Cloud of Data's consultancy services help clients understand the implications of taking data and more to the Cloud.

Data Science London is a non-profit organization dedicated to the free, open, dissemination of data science.

by Peter Skomoroch. MACHINE LEARNING, DATA MINING, AND MORE

Data Science Questions and Answers from experts

a PhD student at Berkeley

MDS, Inc. Helps Build Careers in Data Science, Advanced Analytics, Big Data Architecture, and High Performance Software Engineering

a technology guy with a penchant for the web and for data, big and small

about helping professional programmers to confidently apply machine learning algorithms to address complex problems.

data-driven consulting and design

a data scientist at Twitch. I handle the whole data pipeline, from tracking to model-building to reporting.

Data Mining, Analytics, Big Data, Data, Science not a blog a portal

is some of, all of, or much more than the above and this blog explores its impact on information technology, the business world, government agencies, and our lives.

How a Social Scientist Jumps into the World of Big Data

Thoughts on Statistical Computing and Visualization

Learning To Be A Data Scientist

Musings on data science, machine learning and stats.

The File Drawer](http://chris-said.io/) - Chris Said's science blog

Visualization and Statistics

A Machine Learning Craftsmanship Blog

Handbook and recipes for data-driven solutions of real-world problems

A blog on the new emerging data economy

A blog with resources for data science learners

A full-fledged website about data science and analytics study material.

Data science tutorials for beginners!

Blog for understanding Neural Networks!

Blog for NLP and transfer learning!

Dedicated to clear explanations of machine learning!

Data Science with Esoteric programming languages

Facebook Accounts

Twitter Accounts

Rapid-fire, live tryouts for data scientists seeking to monetize their models as trading strategies

Data Viz Wiz | Data Journalist | Growth Hacker | Author of Data Science for Dummies (2015)

Big Data, Data Science, Predictive Modeling, Business Analytics, Hadoop, Decision and Operations Research.

Director of Data Science at @ExploreAltamira

Data scientist at Twitter

Dev, Design, Data Science @mattermark #hackerei

datascientist @Ekimetrics. , #machinelearning #dataviz #DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast

Data Science Central is the industry's single resource for Big Data practitioners.

Data Science. Big Data. Data Hacks. Data Junkies. Data Startups. Open Data

Documenting my path from SQL Data Analyst pursuing an Engineering Master's Degree to Data Scientist

Mission is to help guide & advance careers in Data Science & Analytics

Tips and Tricks for Data Scientists around the world! #datascience #bigdata

White House Data Chief, VP @ RelateIQ.

Data nerd, hacker, student of conflict.

Networks, #MachineLearning and #DataScience. I work on #Social Media. Postdoc at @IndianaUniv

Running with #BigData--enjoying a love/hate relationship with its hype. @iSchoolSU #DataScience Program Mgr.

Working @ GrubHub about data and pandas

KDnuggets President, Analytics/Big Data/Data Mining/Data Science expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, part-time philosopher.

Data Scientist in Residence at @accel.

ReTweeting about data science

Scientist at Facebook and Julia developer. Author of Machine Learning for Hackers and Bandit Algorithms for Website Optimization. Tweets reflect my views only.

Principal Data Scientist @ Microsoft Data Science Team

Hacker - Pandas - Data Analyze

The Economist's Data Editor and co-author of Big Data (http://big-data-book.com ).

Organizer of https://meetup.com/San-Diego-R-Users-Group/

Data science instructor, and founder of Data School

Interactive data visualization and tools. Data flaneur.

DataScientist, PhD Astrophysicist, Top #BigData Influencer.

Data story teller, visualizations.

PhD Student. Programming, Mobile, Web. Artificial Intelligence, Intelligent Robotics Machine Learning, Data Mining, Natural Language Processing, Data Science.

Data Analytics Recruitment Specialist at Salt (@SaltJobs) | Analytics - Insight - Big Data - Datascience

Opinions of full-stack Python guy, author, instructor, currently playing Data Scientist. Occasional fathering, husbanding, ult|goalt-imate, organic gardening.

Data Scientist at BizQualify, Developer

Data @ Jawbone. Turned data into stories & products at LinkedIn. Text mining, applied machine learning, recommender systems. Ex-gamer, ex-machine coder; namer.

Visualization & interaction designer. Practical cyclist. Author of vis books: http://www.oreilly.com/pub/au/4419

Cloud Computing/ Big Data/ Open Data Analyst & Consultant. Writer, Speaker & Moderator. Gigaom Research Analyst.

Creating intelligent systems to automate tasks & improve decisions. Entrepreneur, ex Principal Data Scientist @LinkedIn. Machine Learning, ProductRei, Networks

Solution Architect @ IBM, Master Data Management, Data Quality & Data Governance Blogger. Data Science, Hadoop, Big Data & Cloud.

Tweet blog posts from the R blogosphere, data science conferences and (!) open jobs for data scientists.

Computer scientist researching artificial intelligence. Data tinkerer. Community leader for @DataIsBeautiful. #OpenScience advocate.

Data Science geek @ UALR

Data scientist, genetic origamist, hardware aficionado

Social Scientist. Hacker. Facebook Data Science Team. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.

Data Scientist at BBVA Compass

Enjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile kaggler/data scientist

Complex Event Processing, Big Data, Artificial Intelligence and Machine Learning. Passionate about programming and open-source.

InfoGov; Bigdata; Data as a Service; Data Science; Open, Social & Business Data Convergence

IT analyst with Ovum covering Big Data & data management with some systems engineering thrown in.

Data Scientist | Author | Entrepreneur. Co-founder @DataCommunityDC. Founder @DistrictDataLab. #DataScience #BigData #DataDC

Data Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon alumni (Blog: https://allthingsds.wordpress.com )

Pandas (Python Data Analysis library).

Senior Manager - @Seagate Big Data Analytics | @McKinsey Alum | #BigData + #Analytics Evangelist | #Hadoop, #Cloud, #Digital, & #R Enthusiast

The data news crew at @WNYC. Practicing data-driven journalism, making it visual and showing our work.

Newsletters

A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.

Youtube Videos & Channels

Telegram Channels

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former.

Beautiful posts on DS/ML theme with video or graphic vizualization.

Slack Communities

Competitions

Infographic

Data Sets

Open Data Sources

416
180
5y 5m
MIT

source{d} datasets ("big code") for source code analysis and machine learning on source code

204
50
1y 5m
n/a

NAYN.CO news archive in Turkish

3
0
1y 7m
Apache-2.0