Let's go back to the example of two variables, age and height. Each distribution has its own expected value and its own variance, but there's something here that we haven't captured, which is the relation between the two. As you can imagine, age and height are somewhat related because the older someone is, the taller they are, or similar things. How do we capture that? We capture it with a very important concept called covariance. There's also another concept called correlation. This is what you're going to learn in this video. Sometimes we want to know how two random variables are related to each other. Having a good understanding of these relations helps us build accurate models and make better decisions. So consider the discrete random variable X, which is the age of a child, and let's consider three random discrete variables. Y1 is the height of the child in inches, Y2 are the grades in some particular test, and Y3 is the number of naps per day. We're given some data as follows. The question is how is X compared to any of the three Y variables, and how do you compare these relations? To help us better visualize what's going on in each of these data sets, let's actually generate scatter plots for each, where the horizontal axis is X and the vertical axis is Y1, Y2, or Y3. So the first plot is like this, the second one is like this, and the third one is like this. And I'm sure you can start seeing some kind of pattern. How do you think these relate? Well first let's look at some metric. For the age versus height one, we can look at both of the means. The mean of the age, which is 10.5, and the mean of the heights, which is 60. And that's this point in the middle, the point 10.5, 60 is the point where you would balance these points. For age versus grades, it's this point in the middle, where mean of X is 10.5 again, and the mean of Y is 5. And for this one over here, age versus naps per day, is 10.5 for the age, and 3.7 naps per day as an average. Now we can also look at the variances. Let's look at the X variance, which means we forget about the Y coordinate, and calculate the three variances, which are 9.17, because it's the same data set for ages. And now let's look at the Y coordinates only, and calculate the three variances, which are 39.56, 9.78, and 7.57. So we have some information about each of the means, and each of the variances. However, I'm sure you see something else going on. The age versus height one is almost a diagonal that goes up to the right. The age versus naps is the opposite, it goes to the bottom, and the age versus grades is kind of all over the place. That can be captured in something called the covariance. The covariance of the first plot is bigger than 0, the covariance of the second plot is almost 0, and the covariance of the third plot is less than 0. And the covariance talks about the relation between the two. As you can imagine, the older the child is, the more height they have, and that explains the first plot. Age and grades seem to not be very related. You can be old and young, and have high or low grades, but they're not related. And age and naps per day are the opposite. The older you are as a kid, the less naps you're gonna have per day. So that's what covariance really summarizes. Now, how do we calculate covariance? Well, the first step is, as usual, to center the graphs. So let's actually subtract the mean of x from the x-coordinates, and the mean of y from the y-coordinates, in order to get this center point to be at the origin. And then let's also divide by the standard deviation of x and of y, in order to have these really nice plots where the x variance and the y variance are both 1. Now that we have that, let's try to cook up a formula that captures this trend over here, and this trend over here, and the lack of trends in the middle. So what would this formula be? Well, let's look at the plot in the left. It seems that when you move to the right, the coordinates move up. On the one in the middle, there seems to be no rule, and on the one on the right, if you move to the right, the coordinates move down. So let's try to capture this as some numbers that could be positive or negative. Let's look at age versus height. When you move to the right by x, you move up by y, and when you move to the left by x, you move down by a quantity y. So the quantities always have the same sign. If I move to the right on some sign, positive or negative, then the points move up or down by the same sign, positive or negative. Let's look at the coordinates in the points in the left. Some of them have a positive x-coordinate and also a positive y-coordinate, and some of them have a negative x-coordinate and also a negative y-coordinate. In other words, the coordinates of x and y tend to be the same sign. Maybe not for every point, but for the majority, the coordinate of x and the coordinate of y are the same sign. The opposite things happen on the left graph. When you have a negative x-coordinate, you have a positive y-coordinate, and when you have a positive x-coordinate, then you have a negative y-coordinate. So they tend to have a different sign. And for the age versus grade ones, the one in the middle, nothing seems to happen. Sometimes the x-coordinate is positive and the y-one is positive, but sometimes the x-one is negative and the y-one is positive. It seems like a land with no rules, so anything can happen here. So now let's look at the product of coordinates, because on the left, the coordinates normally have the same sign, so the product of coordinates is normally a positive number. On the one on the right, the coordinates normally have a different sign, so their product is a negative number, because positive times negative or negative times positive is negative. And for the one in the middle, well, we can have both positive numbers and negative numbers, because anything can happen. Now, what if I were to add all these products of coordinates for each number? Well, in the left I'm going to get something positive, in the right I'm going to get something negative, and in the middle I may get something close to zero. Maybe positive or negative, but probably close to zero, because I'm adding things that also cancel with others. And this sum is called the covariance. So we get to the important formula here, which is covariance. It's going to tell us if one variable makes the other one grow, or makes it decrease, or if it doesn't do anything to the other variable. But it's not exactly the sum of the products x y, it's almost that. It's actually this number. First you have to center the data, and then you take the average of all these products. And covariance is going to be the one to tell us if the first variable makes the other one grow, such as age and height, if the first variable does nothing to the other variable, which is age and grades, or if the first variable makes the second variable decrease, like age and naps per day. Now we're ready to do some calculations. Here is the table of age and height, which recall it had a covariance bigger than zero. This is the mean of ages and the mean of heights. And what we're going to do is first center the data. So let's subtract the mean of age from all the ages, and the mean of heights from all the heights, and then multiply these two columns to get this. And when we add it and take the average, we get a hundred and seventy divided by ten, which is seventeen. This covariance is seventeen because the data is positively correlated in the sense that when age grows, height grows. Now let's do age and naps per day. That's the one that was negatively correlated. It has covariance negative. Here's again the mu of X, the mu of Y. Here's age, here's naps, here's the centered age, and the centered naps per day. And the product of these two columns is this. As you can see, all the numbers are negative. It's a sum of minus seventy four point five. When we take the average, we divide by ten because there's ten kids. And we get a covariance of negative seven point four five. That is negative because, as you know, the higher the age, the less naps per day. And finally, we can do the covariance formula for the age versus grades distribution. Here are all the numbers. Age to grades, the centered age, the centered grades, the product of the two centered coordinates, and that sum is going to be one. And when we divide by ten, we get that covariance is one over ten or zero point one, which is very close to zero. So one of them doesn't have influence over the other one, or at least very very little influence for the fact that the covariance is very small. So a small summary, we have three pairs of variables. The ones that grow together have a positive covariance of seventeen, age versus height. The ones that look independent from each other have a covariance of zero point one, a very small. And the ones that seem to be negatively correlated have a covariance of negative seven point four five, which is less than zero.

$Mathematics for Machine Learning and Data Science$

Mathematics for Machine Learning and Data Science

Beginner

Topics

Deep Learning

Mathematical Foundations

Supervised Learning

Collaborator

DeepLearning.AI

Week 2: Describing probability distributions and probability distributions with multiple variables

Lesson 1 - Describing Distributions

Expected Value
Video
・
11 mins

Other measures of central tendency: median and mode
Video
・
5 mins

Expected value of a Function
Video
・
3 mins

Sum of expectations
Video
・
7 mins

Variance
Video
・
11 mins

Standard Deviation
Video
・
3 mins

Sum of Gaussians
Video
・
3 mins

Standardizing a Distribution
Video
・
3 mins

Interactive Tool: Mean, median and standard deviation
Reading
・
15 mins

Skewness and Kurtosis: Moments of a Distribution
Video
・
1 min

Skewness and Kurtosis - Skewness
Video
・
8 mins

Skewness and Kurtosis - Kurtosis
Video
・
6 mins

Quantiles and Box-Plots
Video
・
3 mins

Visualizing data: Box-Plots
Video
・
3 mins

Visualizing data: Kernel density estimation
Video
・
2 mins

Visualizing data: Violin Plots
Video
・
1 min

Visualizing data: QQ plots
Video
・
2 mins

Week 2 - Practice Quiz
Practice Quiz
・
30 mins

Lesson 2 - Probability Distributions with Multiple Variables

Joint Distribution (Discrete) - Part 1
Video
・
5 mins

Joint Distribution (Discrete) - Part 2
Video
・
5 mins

Joint Distribution (Continuous)
Video
・
5 mins

Marginal and Conditional Distribution
Video
・
6 mins

Conditional Distribution
Video
・
4 mins

Covariance of a Dataset
Video
・
9 mins

Covariance of a Probability Distribution
Video
・
11 mins

Covariance Matrix
Video
・
2 mins

Correlation Coefficient
Video
・
4 mins

Summary statistics and visualization of data sets
Code Example
・
1 hour

Multivariate Gaussian Distribution
Video
・
6 mins

Exploratory Data Analysis - Data Visualization and Summary Statistics
Code Example
・
1 hour

Week 2 - Summative Quiz

Graded・Quiz

・

30 mins

Programming Assignment - Loaded Dice

Simulating Dice Rolls with Numpy (helper for the assignment, not necessary and not graded)
Code Example
・
1 hour

Loaded Dice

Graded・Code Assignment

・

1 hour 40 mins

Week 2 Wrap Up

Week 2 - Conclusion
Video
・
1 min

Week 2 - Slides
Reading
・
10 mins

Week 3: Sampling and Point estimation