# How to Do Mathematics Easily – Intro to Deep Learning #4

Hey are you okay? Siraj, show them the math behind deep learning. Totally. hello world it’s SIraj and let’s learn about the math needed to do deep learning. Math is in everything not just every field of engineering and science. It’s between every note in a piece of music and hidden in the textures of a painting. Deep learning is no different. Math helps us define rules for our neural network so we can learn from our data. if you wanted to, you could use deep learning without ever knowing anything about math. There are a bunch of readily available APIs for tasks like computer vision and language translation, but if you want to use a library like TensorFlow to make a custom model to solve a problem knowing what math terms mean when you see them pop up is helpful and if you want to advance the field through research, don’t even trip! You definitely need to know the math. Deep learning mainly pulls from three branches of math: linear algebra, statistics and calculus. if you don’t know any of the topics, i recommend a cheat sheet of the important concepts and I’ve linked to one for each in the description so let’s go over the four-step process of building a deep learning pipeline and talk about how math is used at each step once we’ve got a dataset that we want to use we want you process it we can clean the data of any empty values, remove features that are not necessary but these steps don’t require math. A step that does, though, i s called normalization. This is an optional step that can help our model reach convergence, which is that point when our prediction gives us the lowest error possible, faster, since all the values operate on the same scale. This idea comes from statistics. You have a 17.4 percent chance of making a straight. there are several strategies to normalize data although a popular one is called min max scaling. If we have some given data we can use the following equation to normalize it. We take each value in the list and subtract the minimum value from it, then divide that result by the maximum value minus the min value. we then have a new list of data within the range of 0 to 1 and we do this for every feature we have so they’re all in the same scale after normalizing our data we have to ensure that it’s in a format that our neural network will accept. This is where linear algebra comes in. There four terms in linear algebra that show up consistently. Scalars, vectors, matrices and tensors. A scalar is just a single number. A vector is a one-dimensional array of numbers. A matrix is a two-dimensional array of numbers. And a tensor is an N dimensional array of numbers. So a matrix, scalar, vector and spectre, wait not spectre, can all be represented as a tensor. Want to convert data, whatever form it’s in, be that images, words, videos, into tensors, where n is the number of features are data has and defines the dimensionality of our tensor. Let’s use a three-layer feed-forward neural network capable of predicting a binary output given an input as our base example to illustrate some more math concepts going forward When do we use math and deep learning?
When we normalize during processing. Learn a model parameters by searching.
And random weights be initializing. Tensors flow… From input to out
Then measure the error to measure the doubt It gives us what’s real and what’s expected.
Back propogate to get cost corrected. We’ll import our only dependency, Numpy, then initialize our input data and output data as matrices. Once our data is in the right format, we want to build our deep neural network. Deep nets have what are called hyperparameters. These are the high level tuning knobs of the network that we define and they help decide things like how fast our model runs, how many neurons per layer, how many hidden layers. Basically the more complex your neural network gets, the more hyperparameters you’ll have. You can tune these manually using knowledge you have about the problem you’re solving to guess probable values and observe the result. Based on the result, you can tweak them accordingly and repeat that process iteratively. But another strategy you could use is random search. You can identify ranges for each, then you can create a search algorithm that picks values from those ranges at random from a uniform distribution of possibilities which means all possible values have the same probability of being chosen. This process repeats until it finds the optimal hyperparameters. Yay for statistics! We only have number of epochs as our hyperparameter, since we have a very simple neural network We’ll use probability to decide our weight values, too. One common method is randomly initializing samples of each weight from a normal distribution with a low deviation, meaning values are pretty close together. We’ll use it to create a weight matrix with a dimension of three by four, since that’s the size of our input. So every node in the input layer is connected to every node in the next layer. The weight values will be in the range from -1 to 1. Since we have three layers, we’ll initialize two weight matrices. The next set of weights has a dimension four by one which is the size of our output. As data propagates forward in a neural network each layer applies its own respective operation to it. transforming it in some way,
until it eventually outputs a prediction This is all linear algebra. It’s all tensor math. We’ll initialize a for loop to train our network 60,000 iterations Then we’ll want to initialize our layers. The first layer, our input, gets input data. The next layer computes the dot product of the first layer and the first weight matrix. When we multiply two matrices together, like in the case of applying weight values to input data, we call that the dot product. Then it applies a non-linearity to the result which we decided it’s going to be a sigmoid. It takes a real value number and squashes it into a range between 0 and 1. So that’s the operation that occurs in layer 1, and the same occurs in the next layer. We’ll take that value from layer 1 and propagate it forward to layer 2, computing the dot product of it and the next weight matrix, then squashing it into output probabilities with our non-linearity. Since we only have three layers, this output value is our prediction. The way we improve this prediction, the way our network learns, is by optimizing our network over time. So how do we optimize it? Enter calculus. The first prediction our model makes will be inaccurate. To improve it, we first need to quantify
exactly how wrong our prediction is. We’ll do this by measuring the error, or cost. The error specifies how far off the predicted output is from the expected output. Once we have the error value we want to minimize it because the smaller the error the better our prediction. Training a neural network means minimizing the error over time. We don’t want to change our input data but we can change our weights to help minimize this error. If we just brute forced all the possible weights to see what gave us the most accurate prediction, it would take a very long time to compute. Instead, we want some sense of direction for how we can update our weights such that in the next round of training our output is more accurate. To get this direction we’ll want to calculate the gradient of our error with respect to our weight values. We can calculate this by using what’s called the derivative in calculus. When we set deriv to true for our nonlin function, it’ll calculate the derivative of a sigmoid. That means the slope of a sigmoid at a given point, which is the prediction values we give it from l2. We want to minimize our error as much as possible, and we can intuitively think of this process as dropping a ball into a bowl where the smallest error value is at the bottom of the bowl. Once we drop the ball in, we’ll calculate the gradient at each of those positions, and if the gradient is negative, we’ll move the ball to the right. If it’s positive, we’ll move the ball to the left. And we’re using the gradient to update our weights accordingly each time. We’ll keep repeating the process until eventually the gradient is zero, which will give us the smallest error value. This process is called gradient descent, because we are descending our gradient
to approach zero and using it to update our weight values iteratively. I understand everything now. Still understand everything. So to do this programmatically, we’ll multiply the derivative we calculated for our prediction by the error. This gives us our error waited derivative which we’ll call l2_delta This is a matrix of values, one for each predicted output, and gives us a direction. We’ll later use this direction to update this layer’s associated weight values. This process of calculating the error at a given layer and using it to help calculate the error weighted gradient so that we can update our weights in the right direction will be done recursively for every layer starting from the last back to the first. We are propagating our error backwards after we’ve computed our prediction by propagating forward. This is called back propagation. So we’ll multiply the l2_delta values
by the transpose of its associated weight matrix to get the previous layer’s error, then use that error to do the same operation as before, to get direction values to update the associated layers’ weights. so error is minimized. Lastly, we’ll update the weight matrices
for each associated layer by multiplying them by their respective deltas. When we run our code we can see that the error values decreased over time, and our prediction eventually became very accurate. So, to break it down. Deep learning borrows from three branches of math, linear algebra, statistics and calculus. A neural net performs a series of operations on an input tensor to compute a prediction and we can optimize a prediction by using gradient descent to back propagate our errors recursively, updating our weight values for every layer during training. The coding challenge winner from the last video is Jovian Lin. Jovian tried out a bunch of different models to predict sentiment from a dataset of video game reviews.
Wizard of the week! And the runner-up is Vishal Batchu. He tested out several different recurrent nets and eloquently recorded his experiment in his ReadMe. The coding challenge for this video is to train a deep neural net to predict the magnitude of an earthquake and use a strategy to learn the optimal hyperparameters. Details are in the ReadMe. Post your GitHub link in the comments, and I’ll announce the winner next video. Please subscribe if you wanna see more videos like this. Check out this related video, and for now I got to get my math turned up to a million. So thanks for watching!

### 100 thoughts on “How to Do Mathematics Easily – Intro to Deep Learning #4”

• October 6, 2017 at 2:29 pm

Great video!

• October 8, 2017 at 6:29 pm

Tujh se milna he bhai…

• October 9, 2017 at 8:15 am

आप बतायें कि गणित के प्रश्नों में 6+10=66 या 6#10=66 में से कौन सा उचित है ।

• October 12, 2017 at 6:09 pm

Damn, am i only one, who understand that all until 8:10?

• October 12, 2017 at 6:19 pm

By the way, if you wanna test this neural net, use this function

def think(input, syn0, syn1):
return nonlin(np.dot(np.dot(input, syn0), syn1))

print(int(think(np.array([<UR VALUES>]), syn0, syn1)))

• October 12, 2017 at 7:57 pm

Back propagate to get cost corrected.!!!!

• October 16, 2017 at 6:59 am

u r boss siraj…

• October 17, 2017 at 6:21 am

gradient decent = PID loop? then err is the feed back as well as minimizing noise?

• October 20, 2017 at 9:42 am

Mozart no.40 <3 Dope music choice!

• October 20, 2017 at 2:37 pm

Bill Nye of the new age

• October 22, 2017 at 8:20 am

great links in description for learning deep learning math from basic level! thank you

• October 26, 2017 at 6:45 pm

Siraj – the work you do is very meaningful to me – you help me as a math teacher make it clear to my students why math is awesome and abolish questions like "why should we learn math?".

Keep it up!

• October 30, 2017 at 3:06 pm

Linear algebra, statistics, and calculus? My three favorites lol – although I am only just learning lin now, and only took lower division calculus – I suppose I should look up the proofs that said "the proof for this is outside the scope of this course", because advanced calculus basically just does all those proofs.

• November 2, 2017 at 11:36 am

I dont have GitHub

• November 8, 2017 at 5:00 pm

Why not:

def nonlin(x,deriv=False):
if(deriv==True):
return nonlin(x)*(1-nonlin(x))
return 1/(1+np.exp(-x))

???

• November 9, 2017 at 11:10 pm

I get all the maths! But how do you actually programm backpropagation into a CNN??!?! I want to programm my own CNN in Processing (java) and I cannot get the Convolution layer to work… I figured out how to backpropagate to that layer, but it is only ever giving me the same 2 outputs, both near 1. Sometimes one drops to -.999 but I never get any right results… What possible things could I have done wrong? I might be hasting trough all this, because I only started 5 days ago with neural networks, but I really have the motivation to at least make a working CNN…

• November 12, 2017 at 8:50 am

Man. You're good.

• November 14, 2017 at 8:14 pm

Won't you need to know geometric and algebraic topology, if you are doing visual machine learning.

• November 17, 2017 at 4:26 am

Hey Siraj! When I tried your demo code in Python 3 it's always showing me error, can you tell me why??¿

• November 17, 2017 at 9:19 am

• November 30, 2017 at 5:13 pm

Sir, you should provide links of ur memes 😀

• December 7, 2017 at 11:45 am

Do we need to find l2_delta and l1_delta first and then update the syn1, syn0 or can we update syn1 ryt after finding l2_delta @ 8:50 ??

• December 9, 2017 at 7:49 am

I have a question. After normalizing the data wouldn't the output be normalized format as well. How do we turn it back?

• December 10, 2017 at 7:31 pm

I finally understand math behind neural networks…Thanks siraj

• December 11, 2017 at 7:33 am

where do you get all those memes, Siraj?

• December 11, 2017 at 9:27 pm

How to acquire fluent communicating skills like you do?

• December 12, 2017 at 6:15 am

I like how fast and to the point you are and not wasting time bit I wish I understand what is all this

• December 26, 2017 at 5:19 am

Brain Explosion ………….literally 😉

• January 2, 2018 at 5:57 pm

Thank you for this nice introduction! makes things much clearer.
One note so, an integer is not proper python variable.
So
11 = 'something'
will most likely produce a syntax error!
you can use _11 if you like.
As, I see u corrected this in the Github link;)

• January 5, 2018 at 11:25 am

Like Immortal Technique

• January 7, 2018 at 6:51 am

Super good tutorials and very entertaining and clear.

• January 8, 2018 at 3:50 pm

Amazing videos! You're a great teacher! 😀

• January 9, 2018 at 3:56 pm

That memes Fell me connected to ur videos 🙂

• January 15, 2018 at 12:20 pm

Not for dummies. I was lost in first 5 seconds. I feel dumb.

• January 15, 2018 at 12:22 pm

Please send me back to my planet. Feel like I'm not from here:(.

• January 19, 2018 at 3:33 am

Siraj, you bloody rock.

• January 19, 2018 at 2:48 pm

Do I need to know this when I'm just importing libraries?

• January 24, 2018 at 5:09 am

哦哦哦

• January 27, 2018 at 5:29 am

absolutely amazing video , thanks and keep making more

• January 28, 2018 at 3:42 pm

why my error is ~ 0.25. I used python 3 (range instead of xrange). Thanks for the very clear tutorial.

• January 28, 2018 at 3:44 pm

Have you a link to the source code of the tutorial?

• January 30, 2018 at 10:34 am

I still didnt know why we use the match like we use it but I understand now how to use it 😀 So my next step will be to understand why we multiplay specific thinks like using a transposed matrix. Thanks for that video!

• February 9, 2018 at 4:24 am

Maths is the godfather of all sciences

• February 10, 2018 at 10:01 pm

WolfAlpha.

• February 15, 2018 at 6:45 am

This is better than expected, you should rename it "what math you need to know for neural networks"

• February 15, 2018 at 8:28 am

• February 15, 2018 at 11:34 pm

Great

• February 20, 2018 at 2:49 am

1:46 database normalization != data normalization

• March 4, 2018 at 6:16 pm

damn that linear algebra cheat sheet was complicated :/// Not giving up on the three month challenge though !!

• March 7, 2018 at 1:33 am

Hey Siraj Raval, I am interested in deep learning, but I am only in the 8th (going to 9th) grade, therefore I do not currently have the mathematics abilities to pursue this passion. Moreover, I was wondering how I can learn the necessary math to pursue this passion in an accelerated pace (e.g. 3-6 months). Are there any good and easy-to-understand resources to look at, and if so, can you please reply. I also found that I can learn new concepts in a quicker manner than most, as over the past 7 months I have been learning about computer science and I have taught myself multiple langauges such as C, C++, Java, JavaScript, Python, and more. I have also learned a lot about algorithms and algorithm design, as well as computer architecture and low-level manipulation, such as dynamic memory allocation.

P.S. I am currently only doing honors algebra and geometry, so I need to learn A LOT of math in a short period of time

• March 21, 2018 at 5:55 pm

You have no idea how I'm enjoying your videos. Thanks man, you're amazing

• March 21, 2018 at 7:24 pm

Dude… The humor and memes in your videos are on point and SO educational. Please keep making these!

• March 29, 2018 at 8:38 am

I’m phenomenally bad at maths, no content has ever explained how this works, as well as broken down the areas of study I need. Thank you!

• April 21, 2018 at 5:52 pm

@4:32, how is that a normal distribution?

• May 7, 2018 at 4:15 am

Watch it for the 3rd time, You Rock Siraj!!

• May 8, 2018 at 1:34 pm

U r awesome …

• May 9, 2018 at 1:20 am

Can you make a video on Kriging (KG) Method for parameter estimation?

• May 10, 2018 at 1:32 pm

Are you basically Indian? ?

• May 17, 2018 at 8:05 pm

Yo Siraj, I get an error "ValueError: shapes (4,1) and (4,3) not aligned: 1 (dim 1) != 4 (dim 0)" I don't get what's really the problem. I did it same as you did but yet! and btw I use python 3.

• May 25, 2018 at 4:22 pm

" backpropogate to get cost corrected".. loved it

• June 6, 2018 at 5:40 am

Dont get it. But one day I will

• June 14, 2018 at 3:12 am

Do you use fl studio for the beats

• June 19, 2018 at 9:38 am

lol song love it..

• June 20, 2018 at 2:36 am

OMG！You got a Rap song for Backpropagation, Amazing ….

• June 25, 2018 at 4:32 am

lets video play in background while surfing description links…as always

• June 25, 2018 at 6:37 pm

This video is so funny

• June 26, 2018 at 4:42 am

You rap better than Eminem

• June 28, 2018 at 8:27 am

Man, those memes are so fkn distracting! XD

• July 2, 2018 at 6:37 am

that thumbnail wasnt a click bait.

• July 2, 2018 at 7:22 am

THE FIRST EVER VIDEO where I needed to play it in half the speed to get the smallest grip. Damn MATHS!

• July 25, 2018 at 4:53 pm

Of course you're talking correctly maths important in AI but how much time it will consume and I think only on mathematician can do this another talents is wast, you just discourage me

• July 27, 2018 at 12:38 pm

Good intro for neural network keepit up brother.

• August 4, 2018 at 11:12 am

The memes are distracting.

• August 12, 2018 at 7:57 pm

Your videos are a great summary of the course Machine Learning from Prof. Ng, which I'm currently taking. They give me back the quick overview, when I need it.

• August 30, 2018 at 3:37 pm

2. Slow down in your videos

• August 31, 2018 at 3:53 pm

THIS IS SO FUN! 😀

• September 5, 2018 at 5:04 pm

"Hello world! It's siraj", best part of your vids xD

• September 8, 2018 at 1:46 am

I feel as though I need a Ph.D in mathematics in order to understand the concepts that you presented in this video!

• September 18, 2018 at 9:58 am

Thanks for all the resources you are providing. I find it useful.Can you please add me in you github network because i am really interested to know how these guys predicted the earthquake using Neural Networks.

Learner

• September 29, 2018 at 1:33 am

Do you ever hold Machine Learning Meetups in SF? Also, I haven't checked yet but do you have any recommendations on where to look for the latest in content/CF recommendation engines? The best paper I've found is the "Collaborative filtering for implicit feedback data sets" paper written by Koren. I'm very interested in a paper which factors in negative implicit interactions. Great videos!

• October 17, 2018 at 4:28 pm

Siraj you are brilliant! You have truly found your calling. You explain things clearly in an amusing and dynamic way – and your singing is 😂 I've not met anyone make maths so engaging. Thank you!

• October 26, 2018 at 2:10 am

When you sung the mechanism of a neural network in a few verses I lost my shit. Seriously funny and informative

• October 29, 2018 at 1:22 am

😵 what…😲😬

• November 7, 2018 at 12:50 pm

Whats the intro music?

• November 14, 2018 at 7:26 pm

Is anyone else thinking: "Man…this is heavy"

• December 17, 2018 at 6:23 am

This really helped me understand neural networks. Thanks!

• January 1, 2019 at 10:23 pm

The cheat sheet redirects are not working if Firefox, but they work in chrome – so it must be browser-specific and not my OS.

• January 17, 2019 at 4:55 am

Play at .75, thank me later 😉 But you'll lose the fun 🙁

• January 20, 2019 at 3:18 pm

I UNDERSTAND EVERYTHING NOW

• January 22, 2019 at 8:21 am

sorry I think the title is really off from the content.
I don't know what you were thinking with that

• January 28, 2019 at 4:46 pm

I have completed course in machine learning and looking forward to deep learning can you provide me a good source for it that ill give me the entire knowledge

• February 5, 2019 at 2:23 pm

This is the one video on you tube that changes everything!

• March 7, 2019 at 8:23 am

I once wrote and performed a song poking fun at a number of Australian politicians, using Richard Clayderman's "Les Premiers Sourires De Vanessa" as Karaoke-style backing, but Mozart's Symphony in G minor K550 is taking it to the next level 😀 Siraj literally radiates awesomeness and makes me look like a beginner

• May 4, 2019 at 4:32 pm

• May 13, 2019 at 5:28 pm

My CAT Decided What I ATE for 24 HOURS (And This Is What Happended…)

• June 5, 2019 at 7:31 am

I just love this guy videos

• August 15, 2019 at 10:24 am

if you were smart you would use these short songs and post them somewhere like on spotify or somewhere people can litsen to it, cause i would, recall is important.

• August 18, 2019 at 5:07 am

Why you are using Jupyter notebook on first video you said sublime text editor