We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In the last video, you saw an illustration of the k-means algorithm running. Now let's write out the k-means algorithm in detail so that you'd be able to implement it for yourself. Here's the k-means algorithm. The first step is to randomly initialize k cluster centroids, mu1, mu2, through mu k. In the example that we had, this corresponded to when we randomly chose a location for the red cross and for the blue cross corresponding to the two cluster centroids. In our example, k was equal to 2, so if the red cross was cluster centroid 1 and the blue cross was cluster centroid 2, these are just two indices to denote the first and the second cluster, then the red cross would be the location of mu1, and the blue cross would be the location of mu2. Just to be clear, mu1 and mu2 are vectors which have the same dimension as your training examples x1 through, say, x30 in our example. All of these are lists of two numbers, or they are two-dimensional vectors, or whatever dimension the training data had. So if we had n equals 2 features for each of the training examples, then mu1 and mu2 will also be two-dimensional vectors, meaning vectors with two numbers in them. Having randomly initialized the k cluster centroids, k-means will then repeatedly carry out the two steps that you saw in the last video. The first step is to assign points to cluster centroids, meaning color each of the points either red or blue, corresponding to assigning them to cluster centroids 1 or 2 when k is equal to 2. Rinse it out in math. That means that we're going to, for i equals 1 through m, for all m training examples, we're going to set c i to be equal to the index, which can be anything from 1 to k, of the cluster centroid closest to the training example x i. Mathematically, you can write this out as computing the distance between x i and mu k, and in math, the distance between two points is often written like this. It is also called the L2 norm, and what you want to find is the value of k that minimizes this, because that corresponds to the cluster centroid mu k that is closest to the training example x i, and then the value of k that minimizes this is what gets set to c i. When you implement this algorithm, you find that it's actually a little bit more convenient to minimize the squared distance, because the cluster centroid with the smallest squared distance should be the same as the cluster centroid with the smallest distance, and when you look at this week's optional labs and practice labs, you see how to implement this in code for yourself. As a concrete example, this point up here is closer to the red or to cluster centroid 1, so if this was training example x 1, we will set c 1 to be equal to 1, whereas this point over here, if this was the 12 training example, this is closer to the second cluster centroid, the blue one, and so we will set this, the corresponding cluster assignment variable to 2, because it's closer to cluster centroid 2. So that's the first step of the k-means algorithm, assign points to cluster centroids. The second step is to move the cluster centroids, and what that means is for lowercase k equals 1 to capital K, the number of clusters, we're going to set the cluster centroid location to be updated to be the average or the mean of the points assigned to that cluster K. Concretely, what that means is we'll look at all of these red points, say, and look at their position on the horizontal axis, look at the value of the first feature x 1, and average that out, and compute the average value on the vertical axis as well, and after computing those two averages, you find that the mean is here, which is why mu 1, that is the location of the red cluster centroid, gets updated as follows. Similarly, we will look at all of the points that were colored blue, that is, with C i equals 2, and compute the average of the value on the horizontal axis, the average of their feature x 1, compute the average of their feature x 2, and those two averages give you the new location of the blue cluster centroid, which therefore moves over here. Just to write those out in math, if the first cluster had assigned to it training examples 1, 5, 6, and 10, just as an example, then what that means is you would compute the average this way. Notice that x 1, x 5, x 6, and x 10 are training examples, four training examples, so we divide by 4, and this gives you the new location of mu 1, the new cluster centroid 4, cluster 1. To be clear, each of these x values are vectors with two numbers in them, or n numbers in them if you have n features, and so mu will also have two numbers in it, or n numbers in it if you have n features instead of 2. Now there is one corner case to this algorithm, which is what happens if a cluster has zero training examples assigned to it? In that case, the second step, mu k, would be trying to compute the average of zero points, and that's not well defined. If that ever happens, the most common thing to do is to just eliminate that cluster so you end up with k minus 1 clusters, or if you really, really need k clusters, an alternative would be to just randomly reinitialize that cluster centroid and hope that it gets assigned at least some points next time around, but it's actually more common when running k-means to just eliminate a cluster if no points are assigned to it. Even though I've mainly been describing k-means for clusters that are well separated, so clusters that may look like this, where if you ask it to find three clusters, hopefully it will find these three distinct clusters, it turns out that k-means is also frequently applied to datasets where the clusters are not that well separated. For example, if you are a designer and manufacturer of cool t-shirts, and you want to decide how do I size my small, medium, and large t-shirts, how small should a small be, how large should a large be, and what should a medium-sized t-shirt really be? One thing you might do is collect data of people likely to buy your t-shirts based on their heights and weights, and you find that the height and weight of people tend to vary continuously on the spectrum without very clear clusters. Nonetheless, if you were to run k-means with, say, three cluster centroids, you might find that k-means would group these points into one cluster, these points into a second cluster, and these points into a third cluster. So if you're trying to decide exactly how to size your small, medium, and large t-shirts, you might then choose the dimensions of your small t-shirt to try to make it fit these individuals well, the medium-sized t-shirt to try to fit these individuals well, and the large t-shirt to try to fit these individuals well, with potentially the cluster centroids giving you a sense of what is the most representative height and weight that you want your three t-shirt sizes to fit. So this is an example of k-means working just fine and giving a useful result, even if the data does not lie in well-separated groups or clusters. So that was the k-means clustering algorithm. Assign cluster centroids randomly, and then repeatedly assign points to cluster centroids and move the cluster centroids. But what is this algorithm really doing, and do we think this algorithm will converge or might it just keep on running forever and never converge? To gain deeper intuition about the k-means algorithm and also see why we might hope this algorithm does converge, let's go on to the next video where you see that k-means is actually trying to optimize a specific cost function. Let's take a look at that in the next video.