Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
Welcome back. In this week, we'll learn to make linear regression much faster and much more powerful. And by the end of this week, you'll be two-thirds of the way to finishing this first course. Let's start by looking at a version of linear regression that can look at not just one feature, but a lot of different features. Let's take a look. In the original version of linear regression, you had a single feature x, the size of the house, and you're able to predict y, the price of the house. So the model was fwb of x equals wx plus b. But now, what if you did not only have the size of the house as a feature with which to try to predict the price, but if you also knew the number of bedrooms, the number of floors, and the age of the home in years, it seems like this would give you a lot more information with which to predict the price. To introduce a little bit of new notation, we're going to use the variables x1, x2, x3, and x4 to denote the four features. And for simplicity, let's introduce a little bit more notation. We'll write x subscript j, or sometimes I'll just say for short x sub j, to represent the list of features. So here, j will go from 1 through 4, because we have 4 features. I'm going to use lowercase n to denote the total number of features. So in this example, n is equal to 4. As before, we'll use x superscript i to denote the i-th training example. So here, x superscript i is actually going to be a list of four numbers, or sometimes we'll call this a vector that includes all the features of the i-th training example. So as a concrete example, x superscript in parentheses 2 will be a vector of the features for the second training example. So it will equal to this 1, 4, 1, 6, 3, 2, and 40. And technically, I'm writing these numbers in a row, so sometimes this is called a row vector rather than a column vector. But if you don't know what the difference is, don't worry about it. It's not that important for this purpose. And to refer to a specific feature in the i-th training example, I will write x superscript i subscript j. So for example, x superscript 2 subscript 3 will be the value of the third feature, that is the number of floors in the second training example, and so that's going to be equal to 2. Sometimes, in order to emphasize that this x2 is not a number, but is actually a list of numbers, that is a vector, we'll draw an arrow on top of that, just to visually show that it's a vector, and over here as well. But you don't have to draw this arrow in your notation. You can think of the arrow as an optional signifier that's sometimes used just to emphasize that this is a vector and not a number. Now that we have multiple features, let's take a look at what our model would look like. Previously, this is how we defined the model, where x was a single feature, so a single number, but now with multiple features, we're going to define it differently. Instead, the model will be fwb of x equals w1 x1 plus w2 x2 plus w3 x3 plus w4 x4 plus b. Concretely, for housing price prediction, one possible model may be that we estimate the price of the house as 0.1 times x1, the size of the house, plus 4 times x2, the number of bedrooms, plus 10 times x3, the number of floors, minus 2 times x4, the age of the house in years, plus 80. Let's think a bit about how you might interpret these parameters. If the model is trying to predict the price of the house in thousands of dollars, you can think of this b equals 80 as saying that the base price of a house starts off at maybe $80,000, assuming it has no size, no bedrooms, no floor, and no age. And you can think of this 0.1 as saying that maybe for every additional square foot, the price will increase by $0.1, or by $100, because we're saying that for each square foot, the price increases by 0.1, you know, times $1,000, which is $100. And maybe for each additional bathroom, the price increases by $4,000, and for each additional floor, the price may increase by $10,000. And for each additional year of the house's age, the price may decrease by $2,000, because the parameter is negative 2. And in general, if you have n features, then the model will look like this. Here again is the definition of the model with n features. What we're going to do next is introduce a little bit of notation to rewrite this expression in a simpler but equivalent way. Let's define w as a list of numbers that list the parameters w1, w2, w3, all the way through wn. In mathematics, this is called a vector, and sometimes to designate that this is a vector, which just means a list of numbers, I'm going to draw a little arrow on top. You don't always have to draw this arrow, and you can do so or not in your own notation, so you can think of this little arrow as just an optional signifier to remind us that this is a vector. If you've taken a linear algebra class before, you might recognize that this is a row vector as opposed to a column vector, but if you don't know what those terms mean, you don't need to worry about it. Next, same as before, b is a single number, not a vector, and so this vector w together with this number b are the parameters of the model. Let me also write x as a list, or a vector, again a row vector, that lists all of the features x1, x2, x3, up through xn. This is again a vector, so I'm going to add a little arrow up on top to signify. So, in the notation up on top, we can also add little arrows here and here to signify that that w and that x are actually these lists of numbers. They're actually these vectors. So, with this notation, the model can now be rewritten more succinctly as f of x equals the vector w dot, and this dot refers to a dot product from linear algebra, of x the vector plus the number b. So, what is this dot product thing? Well, the dot product of two vectors of two lists of numbers, w and x, is computed by taking the corresponding pairs of numbers, w1 and x1, multiplying that, w2 x2, multiplying that, w3 x3, multiplying that, all the way up to wn xn, multiplying that, and then summing up all of these products. Writing that out, this means that the dot product is equal to w1 x1 plus w2 x2, multiplying that, w2 x2 plus w3 x3 plus, all the way up to wn xn, and then finally we add back in the b on top. And you notice that this gives us exactly the same expression as we had on top. So, the dot product notation lets you write the model in a more compact form with fewer characters. The name for this type of linear regression model with multiple input features is multiple linear regression. This is in contrast to univariate regression, which had just one feature. And by the way, you might think this algorithm is called multivariate regression, but that term actually refers to something else that we won't be using here. So I'm going to refer to this model as multiple linear regression. And so that's it for linear regression with multiple features, which is also called multiple linear regression. In order to implement this, there's a really neat trick called vectorization, which will make it much simpler to implement this and many other learning algorithms. Let's go on to the next video to take a look at what is vectorization.