User Experience on mobile might not be great yet, but I'm working on it.

Your first time on this page? Allow me to give some explanations.

Awesome Education

Rough list of my favorite deep learning resources, useful for revisiting topics or for reference. I have got through all of the content listed there, carefully. - Guillaume Chevalier

Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.

Last Update: Nov. 26, 2020, 3:14 p.m.

Thank you guillaume-chevalier & contributors
View Topic on GitHub:

Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.

Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.

Online Classes

Interesting class for acquiring basic knowledge of machine learning applied to trading and some AI and finance concepts. I especially liked the section on Q-Learning.

This is a class given by Philippe Giguère, Professor at University Laval. I especially found awesome its rare visualization of the multi-head attention mechanism, which can be contemplated at the slide 28 of week 13's class.

The most richly dense, accelerated course on the topic of Deep Learning & Recurrent Neural Networks (scroll at the end).


Get back to the basics you fool! Learn how to do Clean Code for your career. This is by far the best book I've read even if this list is related to Deep Learning.

Learn how to be professional as a coder and how to interact with your manager. This is important for any coding career.

The audio version is nice to listen to while commuting. This book is motivating about reverse-engineering the mind and thinking on how to code AI.

This book covers many of the core concepts behind neural networks and deep learning.

Yet halfway through the book, it contains satisfying math content on how to think about actual deep learning.

Some books listed here are less related to deep learning but are still somehow relevant to this list.

Posts and Articles

List of mid to long term futuristic predictions made by Ray Kurzweil.

Interesting for visual animations, it is a nice intro to attention mechanisms as an example.

Awesome for doing clustering on audio - post by an intern at Spotify.

Very interesting CNN architecture (e.g.: the inception-style convolutional layers is promising and efficient in terms of reducing the number of parameters).

Author of Keras - has interesting Twitter posts and innovative ideas.

Thought provoking article about the future of the brain and brain-computer interfaces.

François Chollet's thoughts on the future of deep learning.

Grow and plot a decision tree to automatically figure out hidden rules in your data

Clever trick to estimate an optimal learning rate prior any single full training.

Good for understanding the "Attention Is All You Need" (AIAYN) paper.

Also good for understanding the "Attention Is All You Need" (AIAYN) paper.

SOTA across many NLP tasks from unsupervised pretraining on huge corpus.

Not only the SOLID principles are needed for doing clean code, but the furtherless known REP, CCP, CRP, ADP, SDP and SAP principles are very important for developping huge software that must be bundled in different separated packages.

Data is not to be overlooked, and communication between teams and data scientists is important to integrate solutions properly.

Focus on clear business objectives, avoid pivots of algorithms unless you have really clean code, and be able to know when what you coded is "good enough".

Librairies and Implementations

A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.


An Open Source Machine Learning Framework for Everyone


Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

3y 10m

"Neural Turing Machine" in Tensorflow

3y 6m

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier


Using deep stacked residual bidirectional LSTM cells (RNN) with TensorFlow, we do Human Activity Recognition (HAR). Classifying the type of movement amongst 6 categories or 18 categories on 2 different datasets.


Signal forecasting with a Sequence-to-Sequence (seq2seq) Recurrent Neural Network (RNN) model in TensorFlow - Guillaume Chevalier


Auto-optimizing a neural net (and its architecture) on the CIFAR-100 dataset. Could be easily transferred to another dataset or another classification task.

2y 6m

Using a U-Net for image segmentation, blending predicted patches smoothly is a must to please the human eye.

3y 94d

Attempt at reproducing a SGNN's projection layer, but with word n-grams instead of skip-grams. Paper and more:

1y 11m

A coding exercise: let's convert dirty machine learning code into clean code using a Pipeline - which is the Pipe and Filter Design Pattern applied to Machine Learning.


Keras is another intersting deep learning framework like TensorFlow, it is mostly high-level.

Transfer learning tutorial in TensorFlow for vision from high-level embeddings of a pretrained CNN, AlexNet 2012.

Some Datasets

A topic-centric list of HQ open datasets.


Huge free English speech dataset with balanced genders and speakers, that seems to be of high quality.

A Python framework to benchmark your sentence representations on many datasets (NLP tasks).

Another Python framework to benchmark your sentence representations on many datasets (NLP tasks).

Gradient Descent Algorithms & Optimization Theory

Overview on how does the backpropagation algorithm works.

A visual proof that neural nets can compute any function.

Exposing backprop's caveats and the importance of knowing that while training models.

Unfolding of RNN graphs is explained properly, and potential problems about gradient descent algorithms are exposed.

Visualize how different optimizers interacts with a saddle points.

Visualize how different optimizers interacts with an almost flat landscape.

Okay, I already listed Andrew NG's Coursera class above, but this video especially is quite pertinent as an introduction and defines the gradient descent algorithm.

What follows from the previous video: now add intuition.

A good explanation of overfitting and how to address that problem.

Understanding bias and variance in the predictions of a neural net and how to address those problems.

Appearance of the incredible SELU activation function.

RNN as an optimizer: introducing the L2L optimizer, a meta-neural network.

Complex Numbers & Digital Signal Processing

Simple demo of filtering signal with an LPF and plotting its Short-Time Fourier Transform (STFT) and Laplace transform, in Python.

2y 98d

Wikipedia page that lists some of the known window functions - note that the Hann-Poisson window is specially interesting for greedy hill-climbing algorithms (like gradient descent for example).

Animations dealing with complex numbers and wave equations.

Convergence methods in physic engines, and applied to interaction design.

Nice animations for rotation and rotation interpolation with Quaternions, a mathematical object for handling 3D rotations.

Recurrent Neural Networks

You_Again's summary/overview of deep learning, mostly about RNNs.

Better classifications with RNNs with bidirectional scanning on the time axis.

Two networks in one combined into a seq2seq (sequence to sequence) Encoder-Decoder architecture. RNN Encoder–Decoder with 1000 hidden units. Adadelta optimizer.

4 stacked LSTM cells of 1000 hidden size with reversed input sentences, and with beam search, on the WMT’14 English to French dataset.

Nice recursive models using word-level LSTMs on top of a character-level CNN using an overkill amount of GPU power.

Interesting overview of the subject of NMT, I mostly read part 8 about RNNs with attention as a refresher.

Basically, residual connections can be better than stacked RNNs in the presented case of sentiment analysis.

Nice for photoshop-like "content aware fill" to fill missing patches in images.

Let RNNs decide how long they compute. I would love to see how well would it combines to Neural Turing Machines. Interesting interactive visualizations on the subject can be found here.

Convolutional Neural Networks

Interesting idea of stacking multiple 3x3 conv+ReLU before pooling for a bigger filter size with just a few parameters. There is also a nice table for "ConvNet Configuration".

GoogLeNet: Appearance of "Inception" layers/modules, the idea is of parallelizing conv layers into many mini-conv of different size with "same" padding, concatenated on depth.

Highway networks: residual connections.

Batch normalization (BN): to normalize a layer's output by also summing over the entire batch, and then performing a linear rescaling and shifting of a certain trainable amount.

The U-Net is an encoder-decoder CNN that also has skip-connections, good for image segmentation at a per-pixel level.

Very deep residual layers with batch normalization layers - a.k.a. "how to overfit any vision dataset with too many layers and make any vision model work properly at recognition given enough data".

Epic raw voice/music generation with new architectures based on dilated causal convolutions to capture more audio length.

3D-GANs for 3D model generation and fun 3D furniture arithmetics from embeddings (think like word2vec word arithmetics with 3D furniture representations).

Best Paper Award at CVPR 2017, yielding improvements on state-of-the-art performances on CIFAR-10, CIFAR-100 and SVHN datasets, this new neural network architecture is named DenseNet.

Merges the ideas of the U-Net and the DenseNet, this new neural network is especially good for huge datasets in image segmentation.

Use a distance metric in the loss to determine to which class does an object belongs to from a few examples.

Attention Mechanisms

Attention mechanism for LSTMs! Mostly, figures and formulas and their explanations revealed to be useful to me. I gave a talk on that paper here.

Outstanding for letting a neural network learn an algorithm with seemingly good generalization over long time dependencies. Sequences recall problem.

A very interesting and creative work about textual question answering, what a breakthrough, there is something to do with that.

Interesting way of doing one-shot learning with low-data by using an attention mechanism and a query to compare an image to other images for classification.

In 2016: stacked residual LSTMs with attention mechanisms on encoder/decoder are the best for NMT (Neural Machine Translation).

Improvements on differentiable memory based on NTMs: now it is the Differentiable Neural Computer (DNC).

That yields intuition about the boundaries of what works for doing NMT within a framed seq2seq problem formulation.

AIAYN) - Introducing multi-head self-attention neural networks with positional encoding to do sentence-level NLP without any RNN nor CNN - this paper is a must-read (also see this explanation and this visualization of the paper).


Replace word embeddings by word projections in your deep neural networks, which doesn't require a pre-extracted dictionnary nor storing embedding matrices.

This paper is the sequel to the ProjectionNet just above. The SGNN is elaborated on the ProjectionNet, and the optimizations are detailed more in-depth (also see my attempt to reproduce the paper in code and watch the talks' recording).

Classify a new example from a list of other examples (without definitive categories) and with low-data per classification task, but lots of data for lots of similar classification tasks - it seems better than siamese networks. To sum up: with Matching Networks, you can optimize directly for a cosine similarity between examples (like a self-attention product would match) which is passed to the softmax directly. I guess that Matching Networks could probably be used as with negative-sampling softmax training in word2vec's CBOW or Skip-gram without having to do any context embedding lookups.

YouTube and Videos

A talk for a reading group on attention mechanisms (Paper: Neural Machine Translation by Jointly Learning to Align and Translate).

Generalize properly how Tensors work, yet just watching a few videos already helps a lot to grasp the concepts.

A list of videos about deep learning that I found interesting or useful, this is a mix of a bit of everything.

A YouTube playlist I composed about DFT/FFT, STFT and the Laplace transform - I was mad about my software engineering bachelor not including signal processing classes (except a bit in the quantum physics class).

Yet another YouTube playlist I composed, this time about various CS topics.

Siraj has entertaining, fast-paced video tutorials about deep learning.

Interesting and shallow overview of some research papers, for example about WaveNet or Neural Style Transfer.

Andrew Ng interviews Geoffrey Hinton, who talks about his research and breaktroughs, and gives advice for students.

A primer on how to structure your Machine Learning projects when using Jupyter Notebooks.

Misc. Hubs & Links

Maybe how I discovered ML - Interesting trends appear on that site way before they get to be a big deal.

This is a hub similar to Hacker News, but specific to data science.

This is a Korean search engine - best used with Google Translate, ironically. Surprisingly, sometimes deep learning search results and comprehensible advanced math content shows up more easily there than on Google search.

arXiv browser with TF/IDF features.