10 steps on the road to Deep Learning (part 2)

Julien Simon
10 min readJan 15, 2018

In the first part, I told you about the first five steps you could take to get started with Deep Learning (DL):

1 — You can do it

2 — Ignore Math (for now)

3 — Ride the snake

4 — Walk before you run

5 — Pick a library (and don’t look back)

Let’s look at the five next ones.

The way to Deep Learning is NOT shut. Draw your sword!

6 — It’s code, so code!

DL is code. Most of us learned to code by reading someone else’s code and by figuring out what it does. This applies to DL too.

Image classification is a fun problem to start with. Reasonably-sized data sets such as Dogs and cats, MNIST, CIFAR or German traffic signs are a good starting point.

As you probably know, there are many different neural network architectures. Initially, you should focus on fully connected networks (aka multi-layer perceptrons aka MLPs). Yes, they’re basic but they’re easy to understand. Early on, don’t worry about building the best model possible: focus on understanding the concepts (training set, validation set, loss function, optimization, hyper parameters) and the steps (data preparation, training loop, etc.) required to train and use a model

So, grab some examples based on your favorite library, read them, figure out what each bit does, run them and by all means tweak them: add layers, resize layers, change hyper-parameters (batch size, learning rate, number of epochs, etc.) , predict with your own data, etc. Building intuition on how a given change may affect the performance of a model is extremely important.

Once you’re comfortable with fully connected networks, I’d suggest looking at Convolutional Neural Networks (aka CNNs). Don’t let operations like convolution and pooling scare you away. Use the black box approach we discussed earlier.

Later on, you’ll be ready to look at other architectures like LSTM, GAN, etc. They are pretty weird beasts, so please don’t start with these :)

7 — Welcome to the jungle

After a while, you’ll be ready to expand your horizon and start exploring additional techniques used by DL practitioners. Here are a few that I found very valuable to study in detail.

Reusing pre-trained models

Most libraries come with a model zoo, i.e. a set of models pre-trained on large data sets (such as ImageNet for image classification models). You can download the models and use them right away. No training needed!

Fine-tuning pre-trained models

These models can be further trained on your own data. This is a very powerful technique that lets you reuse the initial training while specializing the model for your own purpose. All the cool kids do it, e.g. Expedia or Conde Nast.

Increasing data set diversity with data augmentation

Data augmentation is another powerful technique which creates additional samples from existing ones, e.g. cropping images, resizing them, distorting them, etc. This is especially useful if your initial data set is too small to train from scratch or even for successful fine-tuning. DL libraries provide data augmentation APIs. You can also use the imgaug standalone library.

Studying additional building blocks for neural networks

For instance, you could look at special layers like Dropout and Batch Normalization, both of which help network learn and generalize better.

You could also dive deeper on activation functions such as sigmoid, ReLU, LeakyRelu and Softmax: try to understand how they differ and when you’d use one or the other. Same thing for loss functions and regularization.

Last but not least, you should spend some time learning about optimization algorithms. Chances are you’ve used SGD, but there are many more: Adagrad, Adadelta, Adam, etc.

And don’t forget: code or it didn’t happen. Try adding one of these Lego blocks to your toy networks, see what happens and try to figure it out.

At this point, Math will inevitably show up in pretty much every blog post or article you’re reading.

Some of you will decide not to go any further. Maybe they simply won’t be interested in spending the time and effort required to climb that wall.

Guess what: THAT’S OK! DON’T FEEL BAD! Even if you stop here, you already know A LOT — more than 99% of developers out there — and you’re now able to use Deep Learning in your applications. You can skip the next two steps and go directly to the Resources section.

Some of you will soldier on. Congratulations and good luck with your studies. Please remember to show patience and respect when less experienced developers will ask for your help. There are more than enough snarky ML/DL specialists out there, we definitely don’t need more ;)

8 — Stop worrying and learn to love Math

Caveat emptor

I took a lot of Math classes during my studies (French engineers will know what ‘a lot’ means). I thought that I had forgotten everything over the years, but surprisingly it came back pretty fast. Depending on your own studies, you’ll either fly through this stuff, or sweat and curse all the way. Stay focused, move at your own pace and remember, YOU CAN DO IT!

Having said that, here’s a list of topics you’ll have to study if you want to start understanding how DL building blocks really work.

Statistics and probabilities

Since DL is able to extract features automatically from data, I felt that mastering statistics and probabilities was less important than for ML where they play a major role in feature engineering.

Still, you can’t really ignore this stuff. The material below is pretty basic but it will certainly come in handy at some point.

Linear algebra

Having a good grasp on linear algebra, matrices, etc. is mandatory in understanding how forward propagation, backward propagation and so on are implemented.

Derivatives

Derivatives are the foundation on which optimization algorithms — such as SGD — built. Again, this is mandatory study if you want to understand them in detail.

If you’re really rusty (or learning this for the first time), start with this one.

Then, you should study this as well (the first 3 chapters should be enough). You need to know partial derivatives in order to grasp the cosmic beauty of back propagation :*)

9 — Stare into the abyss

Once you’ve (re)sharpened your Math skills, you will be ready to figure out how these mysterious DL Lego blocks work. Black boxes no more!

Here are some selected topics you should now be able to crack:

  • Forward propagation & back propagation,
  • Gradients,
  • SGD and other optimization algorithms,
  • Convolution and transposed convolution.

A great way to put your knowledge to the test is to stop using high-level APIs such as model.fit() or model.train(). Try switching to examples based on a custom training loop, where every step of the training process is explicitly defined. Once again, read, understand and hack away!

Every now and then, you’ll stumble on a Math concept that you don’t know well. Just go and learn it. You’ve gone too far to let it stop you, right?

10 — D^Keep learning

There are a million resources out there, but here are a few that I found very valuable. When embarking on such a learning adventure, it definitely helps to rely on structured content. Blog posts and Stack Overflow will only take you so far :)

Books

Data Science from Scratch — Joel Grus

Exactly what the title says and a great combination of tools of techniques you’ll need along the way: Python essentials (probably not enough if you’ve never worked with it before), linear algebra, statistics, probability, plotting data, machine learning algorithms, etc.

The book doesn’t rely on any existing ML/DL framework. Pure Python all the way. Buy it with your eyes closed.

Machine Learning Mastery — Jason Brownlee

Jason runs machinelearningmastery.com, possibly my favorite ML/DL blog. Tons of well-explained, pragmatic examples on how to solve problems! He also authored multiple books based on his posts.

This one is a really good introduction to ML algorithms, but I’m sure they’re all equally good.

Gluon: the straight dope — Zachary Lipton

This is possibly the best online DL book you’ll come across.

Zachary is a Research Scientist for AWS, but I wouldn’t mention his work if it sucked ;)

This guide is a no-nonsense introduction to DL: algorithms, network architectures, etc. Zach starts by explaining general concepts and then shows you how to implement them with MXNet and Gluon. Even if you’ve decided to work with a different library, the former is extremely valuable.

Deep Learning — Ian Goodfellow, Yoshua Bengio and Aaron Courville

Don’t let the simple title fool you. This as BRUTAL as it gets (and I mean that in a good way). The authors are world-expert researchers on Deep Learning and the book is math-heavy (quite an understatement). Being able to read this is a goal in itself. Awesome, but definitely NOT for beginners.

The book is available online, as well as in printed form (Amazon.com).

MOOC

Fast.ai — Jeremy Howard & Rachel Thomas

Jeremy and Rachel have built two free 7-week courses, which are equally brilliant. You’ll be coding from day 1 (mostly on Keras), which is definitely inline with what we’ve been discussing so far: less focus on Math, more focus on code. Production quality is a bit rough at times, but I cannot recommend these courses enough.

Machine Learning and Deep Learning on Coursera — Andrew Ng

Andrew is one of the leading experts in Deep Learning. He authored two reference courses on Coursera, one on Machine Learning and one on Deep Learning.

The Machine Learning course will take you through all major Machine Learning algorithms. Unlike the fast.ai course, the focus here is on algorithms and there’s definitely more Math involved than you’ll probably like :) There are some programming assignments based on GNU Octave. I understand why you’d pick Octave over Matlab, but IMHO this course is screaming for a new revision based on Python. Having said that, it’s extremely interesting!

The Deep Learning course was launched a few months ago. It will take you through neural networks, all major network architectures, etc. Here too, the focus is on understanding algorithms, although programming assignments are based in Python, Keras and Tensorflow. A great quality course, led by a brilliant instructor.

Blogs

There’s a zillion of them, but here are a few that are definitely worth your time.

Conclusion

It’s a long road for sure and it will probably take you many months of hard work to go through of all this. If you have focus and dedication, you will make it and acquire these skills. Good luck on your journey to Deep Learning!

As always, thank you for reading. Please reach out on Twitter if you have questions.

If you liked this post, why not share it on Facebook, Twitter or LinkedIn for more people to enjoy? Thank you :)

The soundtrack to this post was live Judas Priest. Hail the Metal Gods.

--

--