We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
Welcome to the third week of this course. By the end of this week, you have completed the first course of this specialization. So, let's jump in. Last week, you learned about linear regression, which predicts a number. This week, you learned about classification, where your output variable y can take on only one of a small handful of possible values, instead of any number in an infinite range of numbers. It turns out that linear regression is not a good algorithm for classification problems. Let's take a look at why, and this will lead us into a different algorithm called logistic regression, which is one of the most popular and most widely used learning algorithms today. Here are some examples of classification problems. Recall the example of trying to figure out whether an email is spam. So, the answer you want to output is going to be either a no or a yes. Another example would be figuring out if an online financial transaction is fraudulent. Fighting online financial fraud is something I once worked on, and it was strangely exhilarating because I knew there were forces out there trying to steal money, and my team's job was to stop them. So, the problem is, given a financial transaction, can your learning algorithm figure out is this transaction fraudulent, such as was this credit card stolen. Another example we've touched on before was trying to classify a tumor as malignant versus not. In each of these problems, the variable that you want to predict can only be one of two possible values, either no or yes. This type of classification problem, where there are only two possible outputs, is called binary classification, where the word binary refers to there being only two possible classes or two possible categories. In these problems, I will use the terms class and category relatively interchangeably. They mean basically the same thing. By convention, we can refer to these two classes or categories in a few common ways. We often designate classes as no or yes, or sometimes equivalently, false or true, or very commonly using the numbers 0 or 1, following the common convention in computer science with 0 denoting false and 1 denoting true. I'm usually going to use the numbers 0 and 1 to represent the answer y, because that will fit in most easily with the types of learning algorithms we want to implement. But when we talk about it, we'll often say no or yes, or false or true as well. One of the terminologies commonly used is to call the false or 0 class the negative class, and the true or the 1 class the positive class. For example, for spam classification, an email that is not spam may be referred to as a negative example, because the output to the question of is it spam, the output is no or 0. In contrast, an email that is spam might be referred to as a positive training example, because the answer to is it spam is yes or true or 1. To be clear, negative and positive do not necessarily mean bad versus good or evil versus good. It's just that negative and positive examples are used to convey the concepts of absence or 0 or false versus the presence or true or 1 of something you might be looking for, such as the absence or presence of the spaminess or the spam property of an email, or the absence or presence of fraudulent activity, or absence or presence of malignancy in a tumor. Between non-spam and spam emails, which one you call false or 0 and which one you call true or 1 is a little bit arbitrary. Often, either choice could work. So a different engineer might actually swap it around and have the positive class be the presence of a good email or the positive class be the presence of a real financial transaction or a healthy patient. So how do you build a classification algorithm? Here's the example of a training set for classifying if a tumor is malignant. A class 1 positive class, yes class, or benign class 0 or negative class. I've plotted both the tumor size on the horizontal axis as well as the label Y on the vertical axis. By the way, in week 1, when we first talked about classification, this is how we previously visualized it on the number line, except that now we're calling the classes 0 and 1 and plotting them on the vertical axis. Now, one thing you could try on this training set is to apply the algorithm you already know, linear regression, and try to fit a straight line to the data. If you do that, maybe the straight line looks like this, right? And that's your f of x. Linear regression predicts not just the values 0 and 1, but all numbers between 0 and 1, or even less than 0 or greater than 1. But here, we want to predict categories. One thing you could try is to pick a threshold of, say, 0.5, so that if the model outputs a value below 0.5, then you predict Y equals 0, or not malignant, and if the model outputs a number equal to or greater than 0.5, then predict Y equals 1, or malignant. Notice that this threshold value, 0.5, intersects the best fit straight line at this point, so if you draw this vertical line here, everything to the left ends up with a prediction of Y equals 0, and everything on the right ends up with a prediction of Y equals 1. Now, for this particular dataset, it looks like linear regression could do something reasonable. But now, let's see what happens if your dataset has one more training example, this one way over here on the right. Let's also extend the horizontal axis. Notice that this training example shouldn't really change how you classify the data points. This vertical dividing line that we drew just now still makes sense as the cutoff where 2 miss smaller than this should be classified as 0, and 2 miss greater than this should be classified as 1. But once you've added this extra training example on the right, the best fit line for linear regression will shift over like this, and if you continue using the threshold of 0.5, you now notice that everything to the left of this point is predicted as 0, non-malignant, and everything to the right of this point is predicted to be 1, or malignant. This isn't what we want, because adding that example way to the right shouldn't change any of our conclusions about how to classify malignant versus benign tumors. But if you try to do this with linear regression, adding this one example, which feels like it shouldn't be changing anything, it ends up with us learning a much worse function for this classification problem. Clearly, when a tumor is large, we want the algorithm to classify it as malignant. So what we just saw was linear regression causes the best fit line, when we added one more example to the right, to shift over, and thus the dividing line, also called the decision boundary, to shift over to the right. You'll learn more about the decision boundary in the next video. You'll also learn about an algorithm called logistic regression, where the output value of the outcome will always be between 0 and 1, and the algorithm will avoid these problems that we're seeing on the slide. By the way, one thing confusing about the name logistic regression is that even though it has the word regression in it, it's actually used for classification. Don't be confused by the name, which was given for historical reasons. It's actually used to solve binary classification problems, where the output label y is either 0 or 1. In the upcoming optional lab, you'll also get to take a look at what happens when you try to use linear regression for classification. Sometimes you get lucky and it may work, but often it will not work well, which is why I don't use linear regression myself for classification. In the optional lab, you'll see an interactive plot that attempts to classify between two categories, and you'll hopefully notice how this often doesn't work very well, which is okay, because that motivates the need for a different model to do classification tasks. So, please check out this optional lab, and after that, we'll go on to the next video to look at logistic regression for classification.