Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
We've talked about how vectorization lets you speed up your code significantly. In this video, we'll talk about how you can vectorize your implementation of logistic regression so that you can process an entire training set, that is, implement a single iteration of gradient descent with respect to an entire training set without using even a single explicit for loop. I'm super excited about this technique and when we talk about neural networks later without using even a single explicit for loop. Let's get started. Let's first examine the forward propagation step of logistic regression. So if you have m training examples, then to make a prediction on the first example, you need to compute that, compute z using this familiar formula, then compute deactivation, compute y hat in the first example. Then to make a prediction on the second training example, you need to compute that. Then to make a prediction on the third example, you need to compute that, and so on. And you might need to do this m times if you have m training examples. So it turns out that in order to carry out the forward propagation step, that is, to compute these predictions on all m training examples, there is a way to do so without needing an explicit for loop. Let's see how you can do it. First, remember that we defined the matrix capital X to be your training inputs stacked together in different columns like this. So this is a matrix that is a nx by m matrix. So I'm writing this as a Python NumPy shape, but this just means that X is a nx by m dimensional matrix. Now, the first thing I want to do is show how you can compute z1, z2, z3, and so on, all in one step, in fact, with one line of code. So I'm going to construct a 1 by m matrix, this is really a row vector, where I'm going to compute z1, z2, and so on, down to zm, all at the same time. It turns out that this can be expressed as W transpose, the capital matrix X plus, and then this vector, b, b, and so on, b, where this thing, this b, b, b, b, b thing is a 1 by m vector, or 1 by m matrix, or that is a m dimensional row vector. So depending on how familiar you are with matrix multiplication, you might see that W transpose x1, x2, and so on, through xm, that W transpose is going to be a row vector, so this W transpose will be a row vector like that. And so this first term will evaluate to W transpose x1, W transpose x2, and so on, dot, dot, dot, W transpose xm. And then when you add this second term, b, b, b, b, and so on, you end up adding b to each element. So you end up with another 1 by m vector, where that's the first element, that's the second element, and so on, and that's the nth element. And if you refer to the definitions above, this first element is exactly the definition of z1, the second element is exactly the definition of z2, and so on. So just as x was what you obtained when you took your training examples and stacked them next to each other, stacked them horizontally, I'm going to define capital Z to be this, where you take the lowercase z's and stack them horizontally. Okay, so when you stack the lowercase x's corresponding to different training examples horizontally, you get this variable capital X. So in the same way, when you take these lowercase z variables and stack them horizontally, you get this variable which you want to know by capital Z. And it turns out that in order to implement this, the NumPy command is capital Z equals np dot w dot t, that's w transpose x, and then plus b. Now, there is a subtlety in Python, which is that here, b is a row number, or if you want to say, you know, 1 by 1 matrix, that's just a normal row number. But when you add this vector to this row number, Python automatically takes this row number b and expands it out to this 1 by m row vector. So in case this operation seems a little bit mysterious, this is called broadcasting in Python, and you don't have to worry about it for now, we'll talk about it some more in the next video. But the takeaway is that with just one line of code, with this line of code, you can calculate capital Z, and capital Z is going to be a 1 by m matrix that contains all of the lowercase z's, lowercase z1 through lowercase zm. So that was z. How about these values, you know, a, right? What we'd like to do next is find a way to compute a1, a2, and so on through am all at the same time. And just as stacking lowercase x's resulted in capital X, and stacking horizontally, lowercase z's results in capital Z, stacking lowercase a's is going to result in a new variable which we'll define as capital A. And in the program assignment, you see how to implement a vector-valued sigmoid function so that the sigmoid function inputs this capital Z as a variable and very efficiently outputs capital A. So you see the details of that in the program assignment. So just to recap, what we've seen on this slide is that instead of needing to loop over m training examples to compute lowercase z and lowercase a one at a time, you can implement this one line of code to compute all the z's at the same time, and then this one line of code with appropriate implementation of lowercase sigma to compute all the lowercase a's all at the same time. So this is how you implement a vectorized implementation of the forward propagation for all m training examples at the same time. So to summarize, you've just seen how you can use vectorization to very efficiently compute all the activations, all the lowercase a's at the same time. Next, it turns out you can also use vectorization to very efficiently compute the backward propagation to compute the gradients. Let's see how you can do that in the next video.