After supervised learning, the most widely used form of machine learning is unsupervised learning. Let's take a look at what that means. We've talked about supervised learning, and this video is about unsupervised learning. But don't let the name unsupervised fool you. Unsupervised learning is, I think, just as super as supervised learning. When we're looking at supervised learning in the last video, recall that it looks something like this. In the case of a classification problem, each example was associated with an output label Y, such as benign or malignant, designated by the O's and crosses. In unsupervised learning, we're given data that isn't associated with any output labels Y. Say you're given data on patients and their tumor size and the patient's age, but not whether the tumor was benign or malignant. So the dataset looks like this on the right. We're not asked to diagnose whether the tumor is benign or malignant because we're not given any labels Y in the dataset. Instead, our job is to find some structure or some pattern or just find something interesting in the data. This is unsupervised learning. We call it unsupervised because we're not trying to supervise the algorithm to give some, quote, right answer for every input. Instead, we ask the algorithm to figure out all by itself what's interesting or what patterns or structures there might be in this data. With this particular dataset, an unsupervised learning algorithm might decide that the data can be assigned to two different groups or two different clusters. And so it might decide that there's one cluster or group over here and there's another cluster or group over here. This is a particular type of unsupervised learning called a clustering algorithm because it places the unlabeled data into different clusters. And this turns out to be used in many applications. For example, clustering is used in Google News. What Google News does is every day it goes and looks at hundreds of thousands of news articles on the Internet and groups related stories together. For example, here's a sample from Google News where the headline of the top article is, Giant Panda Gives Birth to Rare Twin Cubs at Japan's Oldest Zoo. This article had actually caught my eye because my daughter loves pandas. And so there are a lot of stuffed panda toys and watching panda videos in my house. And looking at this, you might notice that below this are other related articles. Maybe from the headlines alone, you can start to guess what clustering might be doing. Notice that the word panda appears here, here, here, here, and here. And notice that the word twin also appears in all five articles. And the word zoo also appears in all of these articles. So the clustering algorithm is finding articles out of all the hundreds of thousands of news articles on the Internet that day, finding the articles that mention similar words and grouping them into clusters. Now what's cool is that this clustering algorithm figures out on its own which words suggest that certain articles are in the same group. What I mean is there isn't an employee at Google News who's telling the algorithm to find articles that have the word panda and twins and zoo to put them into the same cluster. The news topics change every day and there are so many news stories, it just isn't feasible to have people doing this every single day for all the topics the news covers. Instead, the algorithm has to figure out on its own, without supervision, what are the clusters of news articles today. So that's why this clustering algorithm is a type of unsupervised learning algorithm. Let's look at a second example of unsupervised learning applied to clustering genetic or DNA data. This image shows a picture of DNA microarray data. These look like tiny grids of a spreadsheet and each tiny column represents the genetic or DNA activity of one person. So, for example, this entire column here is from one person's DNA and this other column is of another person. Each row represents a particular gene. So just as an example, perhaps this row here might represent a gene that affects eye color, or this row here is a gene that affects how tall someone is. Researchers have even found a genetic link to whether someone dislikes certain vegetables, such as broccoli or Brussels sprouts or asparagus. So next time someone asks you, why didn't you finish your salad, you can tell them, oh, maybe it's genetic. For DNA microarrays, the idea is to measure how much certain genes are expressed for each individual person. So these colors, red, green, gray, and so on, show the degree to which different individuals do or do not have a specific gene active. And what you can do is then run a clustering algorithm to group individuals into different categories or different types of people. Like maybe these individuals are grouped together, and let's just call this type 1. And these people are grouped into type 2. And these people are grouped as type 3. This is unsupervised learning because we're not telling the algorithm in advance that there is a type 1 person with certain characteristics or a type 2 person with certain characteristics. Instead, what we're saying is, here's a bunch of data. I don't know what the different types of people are, but can you automatically find structure in the data and automatically figure out what are the major types of individuals? Since we're not giving the algorithm the right answer for the examples in advance, this is unsupervised learning. Here's a third example. Many companies have huge databases of customer information. Given this data, can you automatically group your customers into different market segments so that you can more efficiently serve your customers? Quite briefly, the DeepLearning.ai team did some research to better understand the DeepLearning.ai community and why different individuals take these classes, subscribe to the BASH weekly newsletter, or attend our Pioneer AI events. Let's visualize the DeepLearning.ai community as this collection of people. Running clustering, that is, market segmentation, found a few distinct groups of individuals. One group's primary motivation is seeking knowledge to grow their skills. Perhaps this is you. And so, that's great. A second group's primary motivation is looking for a way to develop their career. Maybe you want to get a promotion or a new job or make some career progression. If this describes you, that's great too. And yet another group wants to stay updated on how AI impacts their field of work. Perhaps this is you. That's great too. This is a clustering that our team used to try to better serve our community as we're trying to figure out what are the major categories of learners in the DeepLearning.ai community. So if any of these is your top motivation for learning, that's great. And I hope I'll be able to help you on your journey. Or in case this is you and you want something totally different than the other three categories, that's fine too. And I want you to know I love you all the same. So to summarize, a clustering algorithm, which is a type of unsupervised learning algorithm, takes data without labels and tries to automatically group them into clusters. And so maybe the next time you see or think of a panda, maybe you think of clustering as well. And besides clustering, there are other types of unsupervised learning as well. Let's go on to the next video to take a look at some other types of unsupervised learning algorithms.