Your first time on this page? Allow me to give some explanations.
Awesome Data Science
An awesome Data Science repository to learn and apply for real world problems.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you academic & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
What is Data Science?
A list of colleges and universities offering degrees in data science.
Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1
Official repo for the #tidytuesday project
Ways of doing Data Science Engineering and Machine Learning in R and Python
🐍 Quick reference guide to common patterns & functions in PySpark.
source code from the book Genetic Algorithms with Python by Clinton Sheppard
splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.
Roadmap to becoming an Artificial Intelligence Expert in 2020
Toolboxes - Environment
The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.
Template repository for data science lifecycle project
A Temporal Extension Library for PyTorch Geometric
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Lightweight, Python library for fast and reproducible experimentation
Curated set of transformers that make your work with steppy faster and more effective
A GUI for Pandas DataFrames
Serverless proxy for Spark cluster
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
High performance distributed data processing engine
Intel® Deep Learning Framework
Julia kernel for Jupyter
An open source python library for automated feature engineering
Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125
🦉Data Version Control | Git for Data & Models
Feature engineering and machine learning: together at last!
Feature Store for Machine Learning
Machine Learning Platform for Kubernetes
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management
Hopsworks - Data-Intensive AI platform with a Feature Store
Predictive AI layer for existing databases.
Lightwood is Legos for Machine Learning.
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
♾️ CML - Continuous Machine Learning | CI/CD for ML
Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
Python Data Science Handbook: full text in Jupyter Notebooks
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.
easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.
is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.
IDE – powerful user interface for R. It’s free and open source, works onWindows, Mac, and Linux.
A Python-based ecosystem of open-source software for mathematics, science, and engineering.
Take numerical, textual, image, GIS or other data and give it the Wolfram treatment, carrying out a full spectrum of data science analysis and visualization and automatically generating rich interactive reports—all powered by the revolutionary knowledge-based Wolfram Language.
heavy_dollar_sign: - Datadog is a full-stack monitoring service for large-scale cloud environments that aggregates metrics/events from servers, databases, and applications. It includes support for Docker, Kubernetes, and Mesos.
The Kite Software Development Kit (Apache License, Version 2.0), or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
Run, scale, share, and deploy your models — without any infrastructure or setup.
A platform for efficient, distributed, general-purpose data processing.
Apache Hama is an Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce.
Weka is a collection of machine learning algorithms for data mining tasks.
GNU Octave is a high-level interpreted language, primarily intended for numerical computations.(Free Matlab)
Scientific computing framework with wide support for machine learning algorithms, used by Facebook, Google, and more.
An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com
TensorFlow is an Open Source Software Library for Machine Intelligence
A leading platform for building Python programs to work with human language data.
high-level, high-performance dynamic programming language for technical computing
A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism. Very interesting, but only needed when you're getting advanced.
A library for industrial-strength natural language processing in Python and Cython.
Machine Learning in General Purpose
A scikit-learn based module for multi-label et. al. classification
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
open-source feature selection repository in python
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Sequence learning toolkit for Python
Python package for Bayesian Machine Learning with scikit-learn API
scikit-learn inspired API for CRFsuite
Use evolutionary algorithms instead of gridsearch in scikit-learn
SigOpt wrappers for scikit-learn methods
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
Image processing in Python
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Multiple Pairwise Comparisons (Post Hoc) Tests in Python
Simple structured learning framework for python
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
cuML - RAPIDS Machine Learning Library
Uplift modeling and causal inference with machine learning algorithms
mlpack: a scalable C++ machine learning library --
A library of extension and helper modules for Python's data analysis and machine learning libraries.
A modular active learning framework for Python
PySpark + Scikit-learn = Sparkit-learn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
A toolkit for making real world machine learning and data analysis applications in C++
Python implementation of the rulefit algorithm
[HELP REQUESTED] Generalized Additive Models in Python
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Datasets, Transforms and Models specific to Computer Vision
Data loaders and abstractions for text and NLP
Data manipulation and transformation for audio signal processing, powered by PyTorch
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
Simple tools for logging and visualizing, loading and training
A simplified framework and utilities for PyTorch
A scikit-learn compatible neural network library that wraps PyTorch
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch
Geometric Deep Learning Extension Library for PyTorch
A highly efficient and modular implementation of Gaussian Processes in PyTorch
Deep universal probabilistic programming with Python and PyTorch
Accelerated deep learning R&D
A standard framework for modelling Deep Learning Models for tabular data
An Open Source Machine Learning Framework for Everyone
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥
Deep learning library featuring a higher-level API for TensorFlow.
TensorFlow-based neural network library
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
TensorFlow Reinforcement Learning
Machine Learning Platform for Kubernetes
NeuPy is a Tensorflow based python library for prototyping and building neural networks
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.
TensorFlow ROCm port
Deep learning with dynamic computation graphs in TensorFlow
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow
TensorLight - A high-level framework for TensorFlow
Mesh TensorFlow: Model Parallelism Made Easier
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.
TF-Agents is a library for Reinforcement Learning in TensorFlow
Tensorforce: a TensorFlow library for applied reinforcement learning
Keras community contributions
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
Distributed Deep learning with Keras & Spark
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
Graph Neural Networks with Keras and Tensorflow 2.
QKeras: a quantization deep learning library for Tensorflow Keras
Deep Reinforcement Learning for Keras.
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Visualization Tools - Environments
Library for animated data visualizations and data stories.
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
Three libraries for traditional charts, stock, and maps. Features a hand-drawn style theme option.
Set of products for charting different types of data. Has a special Oracle Apex integration option.
Allows the user to manipulate documents based on data to render charts in SVG.
A data visualization package based on the grammar of graphics.
A series of charting libraries for a variety of uses. Can be compatible back to IE6.
list of open source data visualization tools
Journals, Publications and Magazines
The Genetic and Evolutionary Computation Conference (GECCO)
an international journal devoted to applications of statistical methods at large
Data Science related publications on medium
The Leek group guide to data sharing
The show about modern data infrastructure.
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning
Tech Blog on Master Data Management And Every Buzz Surrounding It
Based in the UK and working globally, Cloud of Data's consultancy services help clients understand the implications of taking data and more to the Cloud.
Data Science London is a non-profit organization dedicated to the free, open, dissemination of data science.
Data Science Questions and Answers from experts
MDS, Inc. Helps Build Careers in Data Science, Advanced Analytics, Big Data Architecture, and High Performance Software Engineering
a technology guy with a penchant for the web and for data, big and small
about helping professional programmers to confidently apply machine learning algorithms to address complex problems.
a data scientist at Twitch. I handle the whole data pipeline, from tracking to model-building to reporting.
Data Mining, Analytics, Big Data, Data, Science not a blog a portal
is some of, all of, or much more than the above and this blog explores its impact on information technology, the business world, government agencies, and our lives.
How a Social Scientist Jumps into the World of Big Data
Thoughts on Statistical Computing and Visualization
The File Drawer](http://chris-said.io/) - Chris Said's science blog
Handbook and recipes for data-driven solutions of real-world problems
A full-fledged website about data science and analytics study material.
Data Science with Esoteric programming languages
Rapid-fire, live tryouts for data scientists seeking to monetize their models as trading strategies
Big Data, Data Science, Predictive Modeling, Business Analytics, Hadoop, Decision and Operations Research.
datascientist @Ekimetrics. , #machinelearning #dataviz #DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast
Data Science Central is the industry's single resource for Big Data practitioners.
Data Science. Big Data. Data Hacks. Data Junkies. Data Startups. Open Data
Documenting my path from SQL Data Analyst pursuing an Engineering Master's Degree to Data Scientist
Mission is to help guide & advance careers in Data Science & Analytics
Tips and Tricks for Data Scientists around the world! #datascience #bigdata
Running with #BigData--enjoying a love/hate relationship with its hype. @iSchoolSU #DataScience Program Mgr.
KDnuggets President, Analytics/Big Data/Data Mining/Data Science expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, part-time philosopher.
Scientist at Facebook and Julia developer. Author of Machine Learning for Hackers and Bandit Algorithms for Website Optimization. Tweets reflect my views only.
Principal Data Scientist @ Microsoft Data Science Team
The Economist's Data Editor and co-author of Big Data (http://big-data-book.com ).
DataScientist, PhD Astrophysicist, Top #BigData Influencer.
PhD Student. Programming, Mobile, Web. Artificial Intelligence, Intelligent Robotics Machine Learning, Data Mining, Natural Language Processing, Data Science.
Opinions of full-stack Python guy, author, instructor, currently playing Data Scientist. Occasional fathering, husbanding, ult|goalt-imate, organic gardening.
Data @ Jawbone. Turned data into stories & products at LinkedIn. Text mining, applied machine learning, recommender systems. Ex-gamer, ex-machine coder; namer.
Visualization & interaction designer. Practical cyclist. Author of vis books: http://www.oreilly.com/pub/au/4419
Cloud Computing/ Big Data/ Open Data Analyst & Consultant. Writer, Speaker & Moderator. Gigaom Research Analyst.
Creating intelligent systems to automate tasks & improve decisions. Entrepreneur, ex Principal Data Scientist @LinkedIn. Machine Learning, ProductRei, Networks
Solution Architect @ IBM, Master Data Management, Data Quality & Data Governance Blogger. Data Science, Hadoop, Big Data & Cloud.
Tweet blog posts from the R blogosphere, data science conferences and (!) open jobs for data scientists.
Computer scientist researching artificial intelligence. Data tinkerer. Community leader for @DataIsBeautiful. #OpenScience advocate.
Social Scientist. Hacker. Facebook Data Science Team. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.
Enjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile kaggler/data scientist
Complex Event Processing, Big Data, Artificial Intelligence and Machine Learning. Passionate about programming and open-source.
InfoGov; Bigdata; Data as a Service; Data Science; Open, Social & Business Data Convergence
IT analyst with Ovum covering Big Data & data management with some systems engineering thrown in.
Data Scientist | Author | Entrepreneur. Co-founder @DataCommunityDC. Founder @DistrictDataLab. #DataScience #BigData #DataDC
Data Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon alumni (Blog: https://allthingsds.wordpress.com )
Senior Manager - @Seagate Big Data Analytics | @McKinsey Alum | #BigData + #Analytics Evangelist | #Hadoop, #Cloud, #Digital, & #R Enthusiast
The data news crew at @WNYC. Practicing data-driven journalism, making it visual and showing our work.
A weekly newsletter to keep up to date with AI, machine learning, and data science. Archive.