We've talked about how vectorization lets you speed up your code significantly. In this video, we'll talk about how you can vectorize your implementation of logistic regression so that you can process an entire training set, that is, implement a single iteration of gradient descent with respect to an entire training set without using even a single explicit for loop. I'm super excited about this technique and when we talk about neural networks later without using even a single explicit for loop. Let's get started. Let's first examine the forward propagation step of logistic regression. So if you have m training examples, then to make a prediction on the first example, you need to compute that, compute z using this familiar formula, then compute deactivation, compute y hat in the first example. Then to make a prediction on the second training example, you need to compute that. Then to make a prediction on the third example, you need to compute that, and so on. And you might need to do this m times if you have m training examples. So it turns out that in order to carry out the forward propagation step, that is, to compute these predictions on all m training examples, there is a way to do so without needing an explicit for loop. Let's see how you can do it. First, remember that we defined the matrix capital X to be your training inputs stacked together in different columns like this. So this is a matrix that is a nx by m matrix. So I'm writing this as a Python NumPy shape, but this just means that X is a nx by m dimensional matrix. Now, the first thing I want to do is show how you can compute z1, z2, z3, and so on, all in one step, in fact, with one line of code. So I'm going to construct a 1 by m matrix, this is really a row vector, where I'm going to compute z1, z2, and so on, down to zm, all at the same time. It turns out that this can be expressed as W transpose, the capital matrix X plus, and then this vector, b, b, and so on, b, where this thing, this b, b, b, b, b thing is a 1 by m vector, or 1 by m matrix, or that is a m dimensional row vector. So depending on how familiar you are with matrix multiplication, you might see that W transpose x1, x2, and so on, through xm, that W transpose is going to be a row vector, so this W transpose will be a row vector like that. And so this first term will evaluate to W transpose x1, W transpose x2, and so on, dot, dot, dot, W transpose xm. And then when you add this second term, b, b, b, b, and so on, you end up adding b to each element. So you end up with another 1 by m vector, where that's the first element, that's the second element, and so on, and that's the nth element. And if you refer to the definitions above, this first element is exactly the definition of z1, the second element is exactly the definition of z2, and so on. So just as x was what you obtained when you took your training examples and stacked them next to each other, stacked them horizontally, I'm going to define capital Z to be this, where you take the lowercase z's and stack them horizontally. Okay, so when you stack the lowercase x's corresponding to different training examples horizontally, you get this variable capital X. So in the same way, when you take these lowercase z variables and stack them horizontally, you get this variable which you want to know by capital Z. And it turns out that in order to implement this, the NumPy command is capital Z equals np dot w dot t, that's w transpose x, and then plus b. Now, there is a subtlety in Python, which is that here, b is a row number, or if you want to say, you know, 1 by 1 matrix, that's just a normal row number. But when you add this vector to this row number, Python automatically takes this row number b and expands it out to this 1 by m row vector. So in case this operation seems a little bit mysterious, this is called broadcasting in Python, and you don't have to worry about it for now, we'll talk about it some more in the next video. But the takeaway is that with just one line of code, with this line of code, you can calculate capital Z, and capital Z is going to be a 1 by m matrix that contains all of the lowercase z's, lowercase z1 through lowercase zm. So that was z. How about these values, you know, a, right? What we'd like to do next is find a way to compute a1, a2, and so on through am all at the same time. And just as stacking lowercase x's resulted in capital X, and stacking horizontally, lowercase z's results in capital Z, stacking lowercase a's is going to result in a new variable which we'll define as capital A. And in the program assignment, you see how to implement a vector-valued sigmoid function so that the sigmoid function inputs this capital Z as a variable and very efficiently outputs capital A. So you see the details of that in the program assignment. So just to recap, what we've seen on this slide is that instead of needing to loop over m training examples to compute lowercase z and lowercase a one at a time, you can implement this one line of code to compute all the z's at the same time, and then this one line of code with appropriate implementation of lowercase sigma to compute all the lowercase a's all at the same time. So this is how you implement a vectorized implementation of the forward propagation for all m training examples at the same time. So to summarize, you've just seen how you can use vectorization to very efficiently compute all the activations, all the lowercase a's at the same time. Next, it turns out you can also use vectorization to very efficiently compute the backward propagation to compute the gradients. Let's see how you can do that in the next video.

Deep Learning Specialization

Intermediate

Topics

Computer Vision

Deep Learning

NLP

Supervised Learning

Transformers

Collaborator

DeepLearning.AI

Week 2: Neural Networks Basics

Logistic Regression as a Neural Network

Binary Classification
Video
・
8 mins

Logistic Regression
Video
・
5 mins

Logistic Regression Cost Function
Video
・
8 mins

Gradient Descent
Video
・
11 mins

Derivatives
Video
・
7 mins

More Derivative Examples
Video
・
10 mins

Computation Graph
Video
・
3 mins

Derivatives with a Computation Graph
Video
・
14 mins

Logistic Regression Gradient Descent
Video
・
6 mins

Gradient Descent on m Examples
Video
・
8 mins

Derivation of DL/dz (Optional)
Reading
・
10 mins

Python and Vectorization

Vectorization
Video
・
8 mins

More Vectorization Examples
Video
・
6 mins

Vectorizing Logistic Regression
Video
・
7 mins

Vectorizing Logistic Regression's Gradient Output
Video
・
9 mins

Broadcasting in Python
Video
・
11 mins

A Note on Python/Numpy Vectors
Video
・
6 mins

Quick tour of Jupyter/iPython Notebooks
Video
・
3 mins

Explanation of Logistic Regression Cost Function (Optional)
Video
・
7 mins

Lecture Notes (Optional)

Lecture Notes W2
Reading
・
1 min

Quiz

Neural Network Basics

Graded・Quiz

・

50 mins

Programming Assignments

Deep Learning Honor Code
Reading
・
2 mins

Programming Assignment FAQ
Reading
・
10 mins

(Optional) Downloading your Notebook, Downloading your Workspace and Refreshing your Workspace
Reading
・
5 mins

Python Basics with Numpy

Graded・Code Assignment

・

1 hour

Logistic Regression with a Neural Network Mindset

Graded・Code Assignment

・

3 hours

Heroes of Deep Learning (Optional)

Pieter Abbeel Interview
Video
・
16 mins

Week 3: Shallow Neural Networks