Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In the previous video, you saw how you can use vectorization to compute the predictions, the lowercase a's for an entire training set, all sort of at the same time. In this video, you see how you can use vectorization to also perform the gradient computations for all m training examples, again, all sort of at the same time. And then at the end of this video, we'll pull it all together and show how you can derive a very efficient implementation of logistic regression. So you will remember that for the gradient computation, what we did was we computed dz1 for the first example, which is going to be a1 minus y1, and then dz2 equals a2 minus y2, and so on, and so on for all m training examples. So what we're going to do is define a new variable, dZ, which is going to be dz1, dz2, dzm. Again, all the d lowercase z variables stacked horizontally, so this would be a 1 by m matrix, or alternatively, a m-dimensional row vector. Now, recall that from the previous slide, we'd already figured out how to compute capital A, which was this a1 through a m, and we had defined capital Y as y1 through ym, also stacked horizontally. So based on these definitions, maybe you can see for yourself that dz can be computed as just a minus y, because this is going to be equal to, you know, a1 minus y1 is going to be the first element, a2 minus y2 is going to be the second element, and so on. And so this first element, a1 minus y1, is exactly the definition of dz1. The second element is exactly the definition of dz2, and so on. So with just one line of code, you can compute all of this at the same time. Now, in the previous implementation, we'd gotten rid of one for loop already, but we still had this second for loop over training examples. So we initialized dw to 0, to the vector of 0s, but then we still had to loop over training examples. So we have dw plus equals x1 times dz1 for the first training example, dw plus equals x2, dz2, and so on. So we do the m times, and then dw divide equals by m. And similarly for b. db was initialized as 0, and then db plus equals dz1, db plus equals dz2, down to, you know, dzm, and then db divide equals m. So that's what we had in the previous implementation. We'd already gotten rid of one for loop, so at least now dw is a vector, and we weren't separately updating dw1, dw2, and so on. So we'd gotten rid of that already, but we still had a for loop over the m examples in the training set. So let's take these operations and vectorize them. Here's what we can do. For the vectorized implementation of db, what it's doing is basically summing up all of these dz's and then dividing by m. So db is basically 1 over m, sum from i equals 1 through m, of dzi. And, well, all the dz's are in that row vector. And so in Python, what you do is implement, you know, 1 over m times np dot sum of dz. So just take this variable and call the np dot sum function on it, and that will give you db. How about dw? I'll just write out the correct equations. You can verify it's the right thing to do. dw turns out to be 1 over m times the matrix x times dz transpose. And to kind of see why that's the case, this is equal to 1 over m. Then the matrix x is x1 through xm, right, stacked up in columns like that. And dz transpose is going to be dz1 down to dzm, like so. And so if you figure out what this matrix times this vector works out to be, this turns out to be 1 over m times x1 dz1 plus dot dot dot plus xm dzm. And so this is a n by 1 vector, and this is what you actually end up with, with dw. Because dw was taking these, you know, xi, dzi, and adding them up. And so that's what exactly this matrix vector multiplication is doing. And so again, with one line of code, you can compute dw. So the vectorized implementation of the derivative calculations is just this. You use this line to implement db, and use this line to implement dw. And notice that without a for loop over the training set, you can now compute the updates you want to your parameters. So now let's put it all together into how you would actually implement logistic regression. So this is our original, highly inefficient, non-vectorized implementation. So the first thing we've done in the previous video was get rid of this for loop, right? So instead of looping over dw1, dw2, and so on, we have replaced this with a vector value dw. And we just say this is dw plus equals xi, which is now a vector, times dzi. But now we'll see that we can also get rid of not just the for loop below, but also get rid of this for loop. So here's how you do it. So using what we had from the previous slides, you would say Z is equal to w transpose x plus b. And the code you write is Z equals np dot w transpose x plus b. And then a equals sigmoid of Z. So you've now computed all of this and all of this for all the values of i. Next, on the previous slide, we said you would compute dz equals capital A minus capital Y. So now you've computed all of this for all the values of i. Then finally, dw equals 1 over m x dz transpose. And db equals 1 over m of, you know, np dot sum dz. So you've just done forward propagation and back propagation, really computing the predictions and computing the derivatives on all m training examples without using a for loop. And so the gradient descent update then would be, you know, w gets updated as w minus the learning rate times dw, which you just computed above, and b gets updated as b minus the learning rate times db. Sometimes there's three colons there to denote this is an assignment. But I guess I haven't been totally consistent with that notation. But with this, you've just implemented a single iteration of gradient descent for logistic regression. Now, I know I said that we should get rid of explicit for loops whenever you can, but if you want to implement multiple iterations of gradient descent, then you still need a for loop over the number of iterations. So if you want to have a thousand iterations of gradient descent, you might still need a for loop over the iteration number. There's an ultimose for loop like that, and I don't think there's any way to get rid of that for loop. But I do think it's incredibly cool that you can implement at least one iteration of gradient descent without needing to use a for loop. So that's it. You now have a highly vectorized and highly efficient implementation of gradient descent for logistic regression. There's just one more detail that I want to talk about in the next video, which is in our description here, I briefly alluded to this technique called broadcasting. Broadcasting turns out to be a technique that Python and NumPy allows you to use to make certain parts of your code also much more efficient. So let's see some more details of broadcasting in the next video.