We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In the last video, you saw what a single hidden layer neural network looks like. In this video, let's go through the details of exactly how this neural network computes its outputs. What you see is that it's like logistic regression, but repeated a lot of times. Let's take a look. So this is what a two-layer neural network looks like. Let's go more deeply into exactly what this neural network computes. Now, we've said before that logistic regression, the circle in logistic regression, really represents two steps of computation. First, you compute z as follows, and in second, you compute the activation as a sigmoid function of z. So a neural network just does this a lot more times. Let's start by focusing on just one of the nodes in the hidden layer. Let's look at the first node in the hidden layer. So I've grayed out the other nodes for now. So similar to logistic regression on the left, this node in the hidden layer does two steps of computation. The first step, and think of as the left half of this node, it computes z equals w transpose x plus v. And the notation we'll use is these are all quantities associated with the first hidden layer. So that's why we have a bunch of square brackets there. And this is the first node in the hidden layer. So that's why we'll have a subscript 1 over there. So first it does that, and then the second step is it computes a11 equals sigmoid of z11, like so. So for both z and a, the notational convention is that a, l, i, the l here in superscript square brackets refers to the layer number, and the i subscript here refers to the nodes in that layer. So the node we've been looking at is layer 1, that is the hidden layer node 1. So that's why the superscript and subscripts were both 1, 1. So that little circle, that first node in the neural network represents carrying out these two steps of computation. Now let's look at the second node in the neural network, the second node in the hidden layer of the neural network. Similar to the logistic regression unit on the left, this little circle represents two steps of computation. The first step is it computes z, this is still layer 1, but now it's the second node, equals w transpose x plus b12, and then a12 equals sigmoid of z12. And again, feel free to pause the video if you want, but you can double check that the superscript and subscript notation is consistent with what we had written here above in purple. So we'll talk through the first two hidden units in the neural network. Hidden units 3 and 4 also represent similar computations. So now let me take this pair of equations and this pair of equations and let's copy them to the next slide. So here's our neural network, and here's the first and here's the second equations they were worked out previously for the first and the second hidden units. If you then go through and write out the corresponding equations for the third and fourth hidden units, you get the following. And just make sure this notation is clear. This is the vector w11, this is a vector transpose times x. So that's what the superscript t there represents, is a vector transpose. Now as you might have guessed, if you're actually implementing a neural network, doing this with a for loop seems really inefficient. So what we're going to do is take these four equations and vectorize. So we're going to start by showing how to compute z as a vector. It turns out you could do it as follows. Let me take these w's and stack them into a matrix. Then you have w11 transpose. So that's a row vector, or this is a column vector transpose, gives you a row vector. Then w12 transpose, w13 transpose, w14 transpose. And so this, by stacking those four w vectors together, you end up with a matrix. So another way to think of this is that we have four logistic regression units there, and each of the logistic regression units has a corresponding parameter vector w. And by stacking those four vectors together, you end up with this 4 by 3 matrix. So if you then take this matrix and multiply it by your input features, x1, x2, x3, you end up with, by how matrix multiplication works, you end up with w11 transpose x, w21 transpose x, w31 transpose x, w41 transpose x. And then let's not forget the b's. So if we now add to this a vector b11, b12, b13, b14. So that's basically this. Then this is b11, b12, b13, b14. And so you see that each of the four rows of this outcome correspond exactly to each of these four rows, each of these four quantities that we had above. So in other words, we've just shown that this thing is therefore equal to z11, z12, z13, z14, right, as defined here. And maybe not surprisingly, we're going to call this whole thing the vector z1, which is taken by stacking up these individuals of z's into a column vector. When we're vectorizing, one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer, we'll stack them vertically. So that's why when you had z11 through z14, those corresponded to four different nodes in the hidden layer. And so we stacked these four numbers vertically to form the vector z1. And to use one more piece of notation, this 4 by 3 matrix here, which we obtained by stacking the lowercase, you know, w11, w12, and so on, we're going to call this matrix W capital 1. And similarly, this vector we're going to call B superscript 1 square bracket. And so this is a 4 by 1 vector. So now we've computed z using this vector matrix notation. The last thing we need to do is also compute these values of a. And so it probably won't surprise you to see that we're going to define a1 as just stacking together those activation values a11 through a14. So just take these four values and stack them together in a vector called a1. And this is going to be sigmoid of z1, where this now has been implementation of the sigmoid function that takes in the four elements of z and applies the sigmoid function element-wise to it. So just to recap, we've figured out that z1 is equal to w1 times the vector x plus the vector b1, and a1 is sigmoid times z1. Let's just copy this to the next slide. And what we see is that for the first layer of the neural network, given an input x, we have that z1 is equal to w1 times x plus b1, and a1 is sigmoid of z1. And the dimensions of this are 4 by 1 equals, this was a 4 by 3 matrix times a 3 by 1 vector plus a 4 by 1 vector b, and this is 4 by 1, same dimension as z. And remember that we said x is equal to a0, right, just like y hat is also equal to a2. So if you want, you can actually take this x and replace it with a0, since a0 is, if you want, just an alias for the vector of input features x. Now, through a similar derivation, you can figure out that the representation for the next layer can also be written similarly, where what the output layer does is it has associated with it set of parameters w2 and b2. So w2 in this case is going to be a 1 by 4 matrix, and b2 is just a real number as 1 by 1. And so z2 is going to be a real number, I'm just going to write it as a 1 by 1 matrix. It's going to be a 1 by 4 thing times a was 4 by 1 plus b2 is 1 by 1, and so this gives you just a real number. And if you think of this last output unit as just being analogous to logistic regression, which had parameters w and b, w really plays an analogous role to w2 transpose, or w2 is really w transpose, and b is equal to b2. Right, so if we were to, you know, cover up the left of this network and ignore all that for now, then this is just, this last output unit is a lot like logistic regression, except that instead of writing the parameters as w and b, we're writing them as w2 and b2, with dimensions 1 by 4 and 1 by 1. So just to recap, for logistic regression, to implement the output, or to implement the prediction, you would compute z equals w transpose x plus b, and a, or y hat, equals a equals sigmoid of z. When you have a neural network with one hidden layer, what you need to implement to compute this output is just these four equations. And you can think of this as a vectorized implementation of computing the output of first these four logistic regression units in the hidden layer, that's what this does, and then of this logistic regression in the output layer, which is what this does. I hope this description made sense, but the takeaway is to compute the output of this neural network, all you need is those four lines of code. So now you've seen how, given a single input feature vector, you can, with four lines of code, compute the output of this neural network. Similar to what we did for logistic regression, we'll also want to vectorize across multiple training examples, and we'll see that by stacking up training examples in different columns in the matrix, with just slight modification to this, you also, similar to what you saw in logistic regression, be able to compute the output of this neural network, not just on one example at a time, but on your, say, your entire training set at a time. So let's see the details of that in the next video.