Your subscription plan will change at the end of your current billing period. Youโll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial โ cancel anytime
So it turns out that expected value, although it tells us a lot about the distribution, it doesn't tell us the whole story. For example, two distributions may have the same expected value, but one of them can be very narrow and the other one can be very wide. This is captured by something called the variance. Let me show you what the variance is. We're going to use an example of a game. Let's look at two games. In the first one, you flip a coin. If it's heads, you win a dollar. If it's tails, you have to pay one dollar. What is the fair price to pay to play this game? The correct answer here is zero. This is the expected value of this game, and if you played it over and over again, this is what you would expect to win on average. Now look at the second game. If the coin lands heads, you will win a hundred dollars, and if it's tails, you lose a hundred dollars. What is the fair price to pay to play this game? Again, if you said zero, you're correct. Both games have an expected value of zero. If you play either game over and over again, on average, you would expect to break even. While these games have the same expected value, there's clearly a big difference between them. The second one has a much larger spread of possible outcomes, where you're losing or winning a hundred dollars at a time. If you want a way to quantify this difference in the spread of possible outcomes, clearly the expected value isn't working, so you'll need to use a new measure, and that measure is called variance. To work up towards the formal definition of variance, I'll first plot both games. The first game looks like this. You can win or lose one dollar with a probability of one half each. If you treat this game like a random variable, X1, then it's pretty clear that the expected value of this game will be right at zero. Now here's the second game, which I'll call X2. You can win or lose a hundred dollars this time, also with probability of a half each. Just like X1, the expected value here is zero. Now you can look at these two games together. Obviously the scaling here is a little off, but look at the high level differences. They have the same expected value or balance at the exact same point. The difference is the spread of the possible outcomes in the first game is relatively small, and the spread of the outcomes on the second game is relatively large, so let's explore some ways to quantify this difference in spread. One way to think of the idea of spread is how far away the points are from the expected outcome. If the spread is small, you would expect most points to be close to the expected value. With a bigger spread, you would expect most of the points to be farther away. Let's look at a few different expressions that try to capture this idea. The first one is deviation, and it's simply the difference between each point and the expected value. If you find the deviations for other points in the first game, you would calculate 1 minus the expected value of zero, giving a deviation of 1, and minus 1 minus 0, which gives a deviation of minus 1. Now if you want to summarize the typical deviation in this data set, a natural thing to do would be to find the average of the deviation, and in this instance that is a minus 1 plus 1 divided by 2, which is 0 divided by 2, which is just 0. That's weird. Does that mean there's no variation at all in this data? It turns out that the average deviation is actually always 0, and perhaps you can think why. The expected value will always appear at the point at which the positive and negative deviations cancel each other out. For example, in the second game you would have deviations of 100 minus 100, but you can probably already see that this will also give an average deviation of 0. So average deviation isn't a good approach, but perhaps it points us in the right direction. The issue was that some deviations are positive and some are negative, so a very intuitive approach would be to consider this expression, the absolute deviation. Now all the deviations become positive and it might actually make sense to take an average. Is that a good measure of spread? Without getting too much into the mathematics of why, I'll just say this approach isn't used that often. Even if it's very intuitive, the absolute value function introduces some messy mathematical properties, and for that reason the approach I'm about to show you is the preferred method of measuring spread. Perhaps you've seen this before, but the most common approach is to actually square the deviations. In other words, all of these deviations, minus 1, 1, minus 100, and positive 100, are squared. All of these values will now be positive, which is what the absolute deviation attempted to do, but without introducing some of the mathematical issues of the absolute value function. Finally, you can find the expected square deviation, or mean square deviation, using this expression. This will tell you the average size of these square deviations, and this is called the variance, which looks like this. Notice that this is just the average of those square deviations. Simplifying these expressions gives the actual values of variance for these two games. The first game has a variance of 1, and that makes sense since both negative 1 and positive 1 squared are equal to 1. The average square deviation is 1, and using the same approach you can calculate that the second game has a variance of 10,000. This may seem like a significant increase in variance, but remember, you're no longer measuring the average deviation, but rather the average square deviation. And as you'll learn in the following videos, this measure of the spread of a data set has many useful properties. One more time, here is the equation of variance. If you're still getting used to the expectation notation, remember it always essentially means that you're finding an average value. So this expression contains four steps. First, find the mean of X. Second, find the deviation from that mean for every value of X. Third, squared those deviations. Fourth, average those squared deviations. And once again, that just makes variance the average squared deviation. Notice, just like expected value, that this is a weighted average. In these examples, all the outcomes had equal probability, but if they didn't, you would use the probability of each outcome as weights. It is symbolically written like this, bar of X. In the last example, you saw two games that had the same mean, but different spread or variance. Now compare these two games. Game one is similar to the one before. You flip a coin, and if it lands heads, you win two dollars, and if it stails, you lose two dollars. The second game is slightly different. If the coin lands head, then you win three dollars, but if it lands tails, then you lose one dollar. Which of these games has a greater variance? To answer this question, you can think of the spread. As it turns out, they both have the same variance because the spread of their outcomes is the same. Let's see how this works. First, let's look at the expected value for each game, or the amount of money you expect to get after many plays. For the first game, E of X1 is one half times minus two plus one half times two, which is zero. However, as you can see, the second game is not centered, so in this case, E of X2 is one half times minus one plus one half times three, which is one. Now let's find the variance of each game. In each case, I'll just find the average square deviation between each outcome and the expected outcome of that game. For game one, you have one half times minus two minus zero squared plus one half times two minus zero squared, which gives a variance of four. For the second game, you have one half times minus one minus one squared plus one half times three minus one squared, which is also four. So this means that these two games have different expected outcomes, but the same variance in those outcomes. The formula var X equals expected value of X minus E of X squared is a great definition, and it's quite intuitive. However, in many cases, it's easier to do the math using this alternative formulation, E of X squared minus E of X squared. Let's see that both expressions are exactly the same thing. It might look a bit scary, but just remember that at the end of the day, it's simply some algebraic manipulation of the variance formula. Let's start with the definition of variance and expand the square terms inside the square brackets. You get the expectation X squared minus two E of X times X plus E of X squared. Expectation is a linear operation, meaning E of X plus Y equals E of X plus E of Y. So next, I'll write the single expectation as a sum of three different expectations, E of X squared minus E of two E of X times X plus E of E of X squared. Here's the next step. Getting to this point depends on a few additional facts about the expectation, which I'll explain one piece at a time. First, E of a constant times X is just a constant times E of X. This is actually just an extension of the fact that the expectation is linear. What this means is that you can pull the constant outside of the expression. In this case, this too was pulled outside of the expectation. Next, E of X is a constant. As you already know, E of X is the expectation or mean of the random variable X. It is just a number. That means that this E of X could also be removed from the expectation. Finally, E of a constant is just that constant. In this case, it allowed the outer E of X on this third term to be removed. Remember, E of X is just a constant, so this is just the expectation of some constant squared, or yet another constant. From here, you can simplify. First, by rewriting the second term 2E of X times E of X as simply 2E of X squared, and finally combining the last two terms to get minus E of X squared, and there you go. That's the identity you were looking for. Sometimes you'll find it very useful to know this other means of calculating variance. Let's explore an important property of variance by returning to the example of a dice game. You roll a fair dice, so all the six sides have a probability of 1 sixth of coming up. Then you win double of whatever you rolled. However, the game costs $5 to play, so here is the net amount you take home for each possible outcome of the game. You can win as much as 7 if you roll a 6, and lose as much as 3 if you roll a 1. What is the variance of this game? Let's imagine that your dice roll is a random variable X. You can think of X as a randomly generated number between 1 and 6, with an equal 1 sixth probability of each outcome. Your net winnings is a new random variable Y. Whatever X outputs, in order to obtain Y, you multiply it by 2 and minus 5. The key relationship I'm going to introduce here is this identity. The variance of AX plus B is equal to A squared times the variance of X. In this case, that would indicate that the variance of your new random variable Y is 4 times the variance of your original random variable X. Now let's think through why this makes sense based on what you've learned. First, I'll draw the expected value of X, which is 3.5, and the expected value of Y, which is 2. Now, recall that variance is the average square deviation from the mean, so let's look at what happened to the deviations. For example, this deviation in X started at 0.5, but that same outcome in Y now has a deviation of 1. So, let's look at what happens to the deviations in X, which is 3.5, and the but that same outcome in Y now has a deviation of 1. Meanwhile, the largest deviation in X is 2.5, and in Y that deviation grows to be 5. In other words, all the deviations double. Since variance is the average square deviation, it makes sense that if all your deviations are twice as big, the variance is now four times as big. In general, if you multiply a random variable by A, the variance increases by a factor of A squared. Notice that the negative did not have any impact on the variance. Intuitively, adding a number to a random variable just changes the point the new distribution is centered around, but it doesn't impact the spread. Multiplying X by a value, however, will impact the spread of your data. This is a useful relationship to be aware of, and hopefully this explanation of why it holds reinforces what variance is measuring.