We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
Let's look at our second unsupervised learning algorithm. Anomaly detection algorithms look at an unlabeled dataset of normal events and thereby learns to detect or to raise a red flag for if there is an unusual or an anomalous event. Let's look at an example. Some of my friends were working on using anomaly detection to detect possible problems with aircraft engines that were being manufactured. When a company makes an aircraft engine, you really want that aircraft engine to be reliable and function well because an aircraft engine failure has very negative consequences. So some of my friends were using anomaly detection to check if an aircraft engine after it was manufactured seemed anomalous or if there seemed to be anything wrong with it. Here's the idea. After an aircraft engine rolls off the assembly line, you can compute a number of different features of the aircraft engine. So say feature X1 measures the heat generated by the engine, feature X2 measures the vibration intensity, and so on and so forth for additional features as well. But to simplify this slide a bit, I'm going to use just two features, X1 and X2, corresponding to the heat and the vibrations of the engine. Now, it turns out that aircraft engine manufacturers don't make that many bad engines. And so the easier type of data to collect would be if you have manufactured M aircraft engines to collect the features, X1 and X2, about how these M engines behave. And probably most of them are just fine, they're normal engines rather than ones with a defect or a flaw in them. And the anomaly detection problem is, after the learning algorithm has seen these M examples of how aircraft engines typically behave in terms of how much heat they generate and how much they vibrate, if a brand new aircraft engine were to roll off the assembly line, and if it had a new feature vector given by X tests, we'd like to know, does this engine look similar to ones that have been manufactured before? So is this probably okay? Or is there something really weird about this engine, which might cause this performance to be suspect, meaning that maybe we should inspect it even more carefully before we let it get shipped out and be installed in an airplane, and then hopefully nothing will go wrong with it. Here's how an anomaly detection algorithm works. Let me plot the examples X1 through XM over here via these crosses, where each cross, each data point in this plot corresponds to a specific engine with a specific amount of heat and a specific amount of vibrations. If this new aircraft engine, X tests, rolls off the assembly line, and if you were to plot these values of X1 and X2, and if it were here, you'd say, okay, that looks probably okay. It looks very similar to other aircraft engines. Maybe I don't need to worry about this one. But if this new aircraft engine has a heat and vibration signature that is, say, all the way down here, then this data point down here looks very different than the ones we saw up on top, and so we will probably say, boy, this looks like an anomaly. This doesn't look like the examples I've seen before. We'd better inspect this more carefully before we let this engine get installed on an airplane. How can you have an algorithm address this problem? The most common way to carry out anomaly detection is through a technique called density estimation. What that means is, when you're given your training set of these M examples, the first thing you do is build a model for the probability of X. In other words, the learning algorithm will try to figure out what are the values of the features X1 and X2 that have high probability, and what are the values that are less likely or have lower chance or lower probability of being seen in the dataset. In this example that we have here, I think it is quite likely to see examples in that little ellipse in the middle, so that region in the middle would have high probability. Maybe things in this ellipse have a little bit lower probability. Things in this ellipse or this oval have even lower probability, and things outside have even lower probability. The details of how you decide from the training set what regions are higher versus lower probability is something we'll see in the next few videos. Having learned a model for P of X, when you are given the new test example, X test, what you will do is then compute the probability of X test. If it is small, or more precisely, if it is less than some small number that I'm going to call epsilon, this is the Greek alphabet epsilon, which you should think of as a small number, which means that P of X is very small. In other words, the specific value of X that you saw for a certain user was very unlikely relative to other users that you have seen. But if P of X test is less than some small threshold, or some small number epsilon, we would raise a flag to say that this could be an anomaly. For example, if X test was all the way down here, the probability of an example landing all the way out here is actually quite low. Hopefully, P of X test for this value of X test would be less than epsilon, and so we would flag this as an anomaly. Whereas, in contrast, if P of X test is not less than epsilon, if P of X test is greater than or equal to epsilon, then we will say that it looks okay, this doesn't look like an anomaly. And that corresponds to if you had an example in here, say, where our model P of X will say that examples near the middle here, they're actually quite high probability. There's a very high chance that a new airplane engine will have features close to these inner ellipses, and so P of X test would be large for those examples, and we'll say it's okay and it's not an anomaly. Anomaly detection is used today in many applications. It is frequently used in fraud detection, where, for example, if you are running a website with many different features, if you compute Xi to be the features of user i's activities, and examples of features might include how often does this user log in and how many web pages do they visit, how many transactions are they making, or how many posts on the discussion forum are they making, to what is their typing speed, how many characters per second do they seem able to type. With data like this, you can then model P of X from data to model what is the typical behavior of a given user. In a common workflow of fraud detection, you wouldn't automatically turn off an account just because it seemed anomalous, but instead you may ask the security team to take a closer look, or put in some additional security checks, such as ask the user to verify their identity with a cell phone number, or ask them to pass a capture to prove that they're human, and so on. But algorithms like this are routinely used today to try to find unusual or maybe slightly suspicious activity, so they can more carefully screen those accounts to make sure there isn't something fraudulent. And this type of fraud detection is used both to find fake accounts, and this type of algorithm is also used frequently to try to identify financial fraud, such as if there's a very unusual pattern of purchases, then that may be something well worth a security team taking a more careful look at. Anomaly detection is also frequently used in manufacturing. You saw an example on the previous slide with aircraft engine manufacturing. But many manufacturers in multiple continents in many, many factories will routinely use anomaly detection to see if whatever they just manufactured, anything from an airplane engine, to a printed circuit board, to a smartphone, to a motor, to many, many things, to see if you've just manufactured a unit that somehow behaves strangely, because that may indicate that there's something wrong with your airplane engine, or printed circuit boards, or what have you, that might cause you to want to take a more careful look before you ship that object to the customer. It's also used to monitor computers in clusters and in data centers, where if Xi are the features of a certain machine i, such as if the features captured the memory usage, the number of disk accesses per second, CPU load, features can also be ratios, such as the ratio of CPU load to network traffic. Then if ever a specific computer behaves very differently than other computers, it might be worth taking a look at that computer to see if something is wrong with it, such as if it has had a hard disk failure, or a network card failure, or something's wrong with it, or if maybe it has been hacked into. Anomaly detection is one of those algorithms that is very widely used, even though you don't seem to hear people talk about it that much. I remember the first time I worked on a commercial application of anomaly detection was when I was helping a telco company put in place anomaly detection to see when any one of their cell towers was behaving in an unusual way, because that probably meant there was something wrong with the cell tower, and so they wanted to get a technician to take a look. So hopefully that helped more people get good cell phone coverage, and I've also used anomaly detection to find fraudulent financial transactions, and these days I often use it to help manufacturing companies find anomalous parts that they may have manufactured but should inspect more often. So it is a very useful tool to have in your tool chest, and in the next few videos we'll talk about how you can build and get these algorithms to work for yourself. In order to get anomalous detection algorithms to work, we'll need to use a Gaussian distribution to model the data, p of x. So let's go on to the next video to talk about Gaussian distributions.