Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In the earlier videos from this week, as well as from the videos from the past several weeks, you've already seen the basic building blocks of board propagation and backpropagation, the key components you need to implement a deep neural network. Let's see how you can put these components together to build your deep net. Here's a network with a few layers. Let's pick one layer and look at the computations focusing on just that layer for now. So for layer L, you have some parameters WL and BL, and for the forward prop, you will input the activations A, L minus one from the previous layer, and output AL. So the way we did this previously was you compute ZL equals WL times AL minus one plus BL, and then AL equals G of ZL. So that's how you go from the input AL minus one to the output AL. And it turns out that for later use, it will be useful to also cache the value ZL. So let me include this cache as well, because storing the value ZL will be useful for the backpropagation step later. And then for the backward step, really for the backpropagation step, again focusing on the computation for this layer L, you're going to implement a function that inputs DA of L and outputs DA L minus one. And just a flesh of the details, the input is actually DA of L as well as the cache. So you have available to you the value of ZL that you computed, and then in addition to outputting DA L minus one, you output the gradients you want in order to implement gradient descent for learning. So this is the basic structure of how you implement this forward step, what we call the forward function, as well as this backward step, which we call a backward function. So just to summarize, in layer L, you're going to have the forward step or the forward prop or the forward function, input AL minus one, and output AL. And in order to make this computation, you need to use WL and BL, and also output a cache, which contains ZL. And then the backward function used in the backprop step will be another function that now inputs DA of L and outputs DA L minus one. So it tells you, given the derivatives with respect to these activations, that's DA of L, what are the derivatives? How much do I wish, you know, AL minus one changes? So compute the derivatives with respect to the activations from the previous layer. Within this box, you need to use WL and BL, and it turns out along the way, you end up computing BZL, and then this box, this backward function can also output DWL and DBL. I was sometimes using red arrows to denote the backward iteration, so if you prefer, we could draw these arrows in red. So if you can implement these two functions, then the basic computation of the neural network will be as follows. You're going to take the input features A0, feed that in, and that will compute the activations of the first layer, let's call that A1. And to do that, you need W1 and B1. And then we'll also, you know, cache away Z1. Now having done that, you feed that to the second layer, and then using W2 and B2, you're going to compute the activations of the next layer, A2, and so on, until eventually, you end up outputting a capital L, which is equal to Y hat. And along the way, we cached all of these values Z. So that's the forward propagation step. Now for the back propagation step, what we're going to do will be a backward sequence of iterations in which you are going backwards and computing gradients like so. So we're just going to feed in here DA of L, and then this box will give us DA of L minus 1, and so on, until we get DA2, DA1. You could actually get one more output to compute DA0, but this is derivative with respect to input features, which is not useful, at least for training the weights of these supervised neural networks. So you could just stop it there. But along the way, backprop also ends up outputting DWL, DBL, right, this is used around this WL and BL. This would output DW3, DB3, and so on. So you end up computing all the derivatives you need. And so just to maybe fill in the structure of this a little bit more, right, these boxes will use those parameters as well. WL, BL, and it turns out that we'll see later that inside these boxes, we'll end up computing DZs as well. So one iteration of training for a neural network involves starting with A0, which is X, and going through forward prop as follows, computing Y hat, and then using that to compute this, and then backprop, right, doing that. And now you have all these derivative terms, and so, you know, W will get updated as W minus the learning rate times DW, right, for each of the layers, and similarly for B, right. Now that you've computed backprop and have all these derivatives. So that's one iteration of gradient descent for your neural network. Now, before moving on, just one more implementational detail. Conceptually, it would be useful to think of the cache here as storing the value of Z for the backward functions. But when you implement this, and you see this in the programming exercise, when you implement this, you find that the cache may be a convenient way to get this value of the parameters of W1, B1 into the backward function as well. So in the programming exercise, you actually store the cache Z as well as W and B, right, so it stores Z2, W2, B2. But from an implementational standpoint, I just find this a convenient way to just, you know, get the parameters copied to where you need to use them later when you're computing backpropagation. So that's just an implementational detail that you see when you do the programming exercise. So you've now seen what are the basic building blocks for implementing a deep neural network. In each layer, there's a forward propagation step, and there's a corresponding backward propagation step, and there's a cache to pass information from one to the other. In the next video, we'll talk about how you can actually implement these building blocks. Let's go on to the next video.