Consider the following scenario. Players X and Y are playing three games to either win or lose a dollar. In game one, there are two possible outcomes. Both players win one dollar each, or both players lose one dollar each. Here are the two possibilities. Either they both win a dollar, or they both lose a dollar, where the horizontal coordinate is how one player does, and the vertical coordinate is how the other player does, and each one happens with probability one half. In game two, two things can happen. The first player can win a dollar, and the second one loses a dollar, or the first one loses a dollar, and the second one wins a dollar, and we're calling these players X and Y, remember? So X is the first player, and Y is the second player. The plot for this one is over here, where half of the times it happens that one player wins a dollar, and the other one loses it, and the other half the time one player loses a dollar, and the other one wins it, and both with probability a half. So recall that the X coordinate tells you what the first player wins or loses, and the Y coordinate what the second one wins or loses. And in game three, more can happen. So actually, both players can win one dollar, both players can lose one dollar, one player can win and the other one loses, or the other player wins and the first one loses. So four possibilities basically, A, B, C and D, all with probability one quarter. So here is a question for you. How similar are these three games for player X and for player Y? Well, let's examine them. And first we're going to examine them independently. So X is how much money in dollars player X wins, and Y is how much money in dollars player Y wins. So let's first look at how much X makes in the first game. And as you may imagine, it's going to be the expected value. And we're only looking at the horizontal coordinate because we're looking at player one. So it can be winning a dollar with probability one half or losing a dollar with probability one half, and that's zero. So the expected value for the first player is zero. And the same thing for the second player, they win a dollar or lose a dollar with probability one half. The same thing happens for the second game. For the second game, for the first player, they can win a dollar or lose a dollar, that's zero. And for the second player, they can win a dollar or lose a dollar. So again, it's zero. And so this game has the same expectancy for the first and second player. Now the third game, let's take a look. Four things can happen, you can win a dollar, or win another dollar, or you can lose a dollar or lose another dollar. And that again adds to zero. And the same thing for the second player, they can win a dollar two times, or lose a dollar two times all with the same probabilities. So these games are basically the same if you only think of one player at a time in terms of expectancy, every player, at the end of the day, they play many, many times, they're gonna end up winning zero. And so these expectancies don't really tell these games apart. Now what happens with the variance? Could it be that the variance can help us? Well, let's calculate the variance of each of the games. For the first game, this is the calculation of the variance, recall that the expected value is zero. So all we care is the expected value of x1 squared, and that's going to be one. And I'll spare you the calculations, but it's actually going to be one for all of these ones. So these games are pretty similar in terms of expectancy and in terms of x variance and in terms of y variance. But obviously, they are different games. However, where does the difference lie? The difference lies in that you have to look at both players at the same time. Otherwise, the game is the same for each player, for each player just win a dollar or lose a dollar. So we need to look at the covariance, the covariance is going to tell these three games apart. And recall that this was the formula for the covariance of a data set. And we're going to calculate it for the three games. So let's do it for game one, here is x and y. Now we center x and y, which nothing changes because the mu of x and the mu of y are zero. And when we multiply them, look at what happens, we have a one and a one because either they both win a dollar, or they both lose a dollar. So the products are one. And when we add this, we're going to get two. And the average is going to be one because we have to divide this by two. So the covariance of the first game is one. And the fact that the covariance is positive shows this correlation here that the more player one wins, the more player two wins. So either they both happy, or they both sad. Now let's look at game two, which is the complete opposite. When one is happy, the other one is sad. And when it's sad, the other one is happy. Again, these are the means they're both zero. And in our table, then either one wins one and the other one loses one or vice versa. The product of these two things is going to always be minus one because one times minus one is equal to minus one and minus one times one is equal to minus one. So when we add them, we get minus two. And we take the average, we get minus one. So this one has a covariance of minus one, which is reflected in the fact that they kind of form this pattern over this diagonal. And that means that either one is happy and the other one is sad, or one is sad and the other one is happy. And finally, game three, we're going to do the calculation. In game three, four things can happen, they both win, they both lose, or one wins and the other loses or vice versa. Again, we saw that the means are zero. So we're going to do this table where these are the possible things that happen to x, these are the possible things that happen to y. Centering makes no difference because the means are zero. And when we multiply them, we get two ones and two minus ones. We get all the possible four scenarios. This is going to be zero. And when we divide in order to get the average, we get zero again. So this covariance is zero. And that shows the fact that there's not really a pattern among these, they have all the four possibilities. So we can't really infer, if we know that one player is happy, we don't know if the other one is happy or sad, and vice versa, because they are more independent of each other. So in other words, we have these three games, game one, game two, and game three. In game one, they both win or both lose. At the same time, in game two is more of a zero-sum game, one wins and the other one loses. And in game three, anything can happen. And in the first game, we had a covariance of one, in the second one, a covariance of minus one, and in the third one, a covariance of zero. Now let's introduce one more game, game four. Game four has the following three outcomes. Either both players win one dollar each, or both players lose one dollar each, or neither player wins nor loses anything. So these three things can happen. Both players win a dollar, both players lose a dollar, and nothing happens. But the probabilities are unequal. So the probability of both players win a dollar is one-half, the probability of both players lose a dollar is one-third, and the probability that nothing happens is one-sixth. And these are illustrated over here. So now, if we look at only one player, the x player, what is the expected value of their game? Well, it's going to be the weighted average of the values, which is going to be one-sixth, and the same thing for the other player. It's going to be one-sixth. So each player wins, on average, one-sixth every time they play this game. Now let's look at the variance. The variance is calculated like this, and we can do this calculation, which is a half times one minus a sixth square, plus a sixth times zero minus a sixth square, plus a third times minus one minus a sixth square, because we have to subtract mu, and then square, and then multiply with the probabilities, and we get 0.806. So the x variance is 0.806, as you can imagine, it's the same as the y variance. And what's the covariance? Well, let's calculate it. Now, we used to say that the covariance was the average of the product of coordinates, but that's why we had equal probabilities. That was way back in the day. Now we don't have equal probabilities, we have that a half, a sixth, and a third. So what we have to do is multiply by the probabilities. So this is what happens in general. You simply multiply by the probability, the product of the x coordinate and the y coordinate. Covariance of x, y can also be expressed like this, which is actually very similar to the variance formula, except you have a different x and y. If you were to say covariance of x, x, you get e of x squared minus e of x all squared, and that's the same as the variance. So if we're going to calculate this, we just do the probabilities times the x coordinate times the y coordinate after centering them, and recall that these are the means and the variances. So we get this calculation over here. This over here is for the first point, and the covariance x, y is 0.806. So as you can see, it's a positive covariance because they win and lose together. So there's that sort of diagonal that shows that the result of player A can help us infer the result of player B. Now let's go back to the example of the phone calls with waiting time and customer rating. Recall that we had two marginal distributions x and y, and now we want to find the covariance. So I want you to try to guess, do you think this covariance is going to be positive or negative? And my guess is that it's going to be negative because as you can see, we have sort of a diagonal that goes down and to the right, and that means negative covariance because you can imagine that the more you wait, the less rating you would put, and the less you wait, the more rating you would put. So these are inversely correlated, and therefore the covariance is probably going to be negative. But let's actually calculate it. When we find the expectation of x, y, we find it to be 18.014 minus the product of expectations of x and y, and that's going to be negative 7.878. So indeed it is negative. Let me repeat this calculation for clarity. Here is the formula for covariance, and here is what we've calculated to be the two expectations of x and y and the expectation of x, y. So the covariance is simply E of x, y minus E x, E y. That is minus 7.878.

$Mathematics for Machine Learning and Data Science$

Mathematics for Machine Learning and Data Science

Beginner

Topics

Deep Learning

Mathematical Foundations

Supervised Learning

Collaborator

DeepLearning.AI

Week 2: Describing probability distributions and probability distributions with multiple variables

Lesson 1 - Describing Distributions

Expected Value
Video
・
11 mins

Other measures of central tendency: median and mode
Video
・
5 mins

Expected value of a Function
Video
・
3 mins

Sum of expectations
Video
・
7 mins

Variance
Video
・
11 mins

Standard Deviation
Video
・
3 mins

Sum of Gaussians
Video
・
3 mins

Standardizing a Distribution
Video
・
3 mins

Interactive Tool: Mean, median and standard deviation
Reading
・
15 mins

Skewness and Kurtosis: Moments of a Distribution
Video
・
1 min

Skewness and Kurtosis - Skewness
Video
・
8 mins

Skewness and Kurtosis - Kurtosis
Video
・
6 mins

Quantiles and Box-Plots
Video
・
3 mins

Visualizing data: Box-Plots
Video
・
3 mins

Visualizing data: Kernel density estimation
Video
・
2 mins

Visualizing data: Violin Plots
Video
・
1 min

Visualizing data: QQ plots
Video
・
2 mins

Week 2 - Practice Quiz
Practice Quiz
・
30 mins

Lesson 2 - Probability Distributions with Multiple Variables

Joint Distribution (Discrete) - Part 1
Video
・
5 mins

Joint Distribution (Discrete) - Part 2
Video
・
5 mins

Joint Distribution (Continuous)
Video
・
5 mins

Marginal and Conditional Distribution
Video
・
6 mins

Conditional Distribution
Video
・
4 mins

Covariance of a Dataset
Video
・
9 mins

Covariance of a Probability Distribution
Video
・
11 mins

Covariance Matrix
Video
・
2 mins

Correlation Coefficient
Video
・
4 mins

Summary statistics and visualization of data sets
Code Example
・
1 hour

Multivariate Gaussian Distribution
Video
・
6 mins

Exploratory Data Analysis - Data Visualization and Summary Statistics
Code Example
・
1 hour

Week 2 - Summative Quiz

Graded・Quiz

・

30 mins

Programming Assignment - Loaded Dice

Simulating Dice Rolls with Numpy (helper for the assignment, not necessary and not graded)
Code Example
・
1 hour

Loaded Dice

Graded・Code Assignment

・

1 hour 40 mins

Week 2 Wrap Up

Week 2 - Conclusion
Video
・
1 min

Week 2 - Slides
Reading
・
10 mins

Week 3: Sampling and Point estimation