Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this video, we'll take a look at how you can use the Scikit-learn library to implement PCA. These are the main steps. First, if your features take on very different ranges of values, you can perform preprocessing to scale the features to take on comparable ranges of values. So if you were looking at the features of different countries, those features take on very different ranges of values. GDP could be in trillions of dollars, whereas other features are less than 100. So feature scaling in applications like that would be important to help PCA find a good choice of axes for you. The next step, then, is to run the PCA algorithm to, quote, fit the data to obtain two or three new axes, Z1, Z2, and maybe Z3. And here, I'm assuming you want two or three axes. If you want to visualize the data in 2D or 3D, if you have an application where you want more than two or three axes, the PCA implementation can also give you more than two or three axes. It's just that it then be harder to visualize. And in Scikit-learn, you would use the fit function or the fit method in order to do this. The fit function in PCA automatically carries out mean normalization. It subtracts out the mean of each feature. And so you don't need to separately perform mean normalization. And so after running the fit function, you would get the new axes, Z1, Z2, maybe Z3. And in PCA, we also call these the principal components, where Z1 is the first principal component, Z2 the second principal component, and Z3 the third principal component. After that, I would recommend taking a look at how much each of these new axes or each of these new principal components explains the variance in your data. I'll show a concrete example of what this means on the next slide. But this lets you get a sense of whether or not projecting the data onto these axes help you to retain most of the variability or most of the information in the original data set. And this is done using the explained variance ratio function. And finally, you can transform, meaning just project the data onto the new axes, onto the new principal components, which you would do with the transform method. And then for each training example, you would just have two or three numbers. You can then plot those two or three numbers to visualize your data. In detail, this is what PCA in code looks like. Here's a data set X with six examples. So X equals numpy array, these six examples over here. And to run PCA to reduce this data from two numbers, X1, X2, to just one number, Z, you would run PCA and ask it to fit one principal component. So N components here is equal to one. And fit PCA to X. PCA1 here is my notation for PCA with a single principal component, with a single axis. And it turns out if you were to print out PCA1 dot explained variance ratio, this is 0.992. And this tells you that in this example, when you choose one axis, this captures 99.2% of the variability of the information in the original data set. Finally, if you want to take each of these training examples and project it to a single number, you would then call this, and this will output this array with six numbers corresponding to your six training examples. So for example, the first training example, 1, 1, projected to the Z axis gives you this number, 1.383 and so on. And so if you were to visualize this data set using just one dimension, this would be the number used to represent the first example. And the second example is projected to be this number and so on. And I hope you take a look at the optional lab, where you see that these six examples have been projected down onto this axis, onto this line, which is now why all six examples now lie on this line that looks like this. And the first training example, which was 1, 1, has been mapped to this example, which has a distance of 1.38 from the origin. So that's why this is 1.38. Just one more quick example. This data is two-dimensional data, and we reduced it to one dimension. What if you were to compute two principal components, so it starts with two dimensions and then also ends up with two dimensions. This isn't that useful for visualization, but it might help us understand better how PCA and how the code for PCA works. So here's the same code, except that I've changed n components to 2. So I'm going to ask the algorithm to find two principal components. And if you do that, the PCA2 explained ratio becomes 0.992, 0.0008. And what that means is that Z1, the first principal component, still continues to explain 99.2% of the variance, and Z2, the second principal component, or the second axis, explains 0.8% of the variance. And these two numbers together add up to 1, because, well, this data is two-dimensional. So the two axes, Z1 and Z2, together, they explain 100% of the variance of the data. And if you were to transform or project the data onto the Z1 and Z2 axes, this is what you get. Now, the first training example is mapped to these two numbers corresponding to this projection onto Z1 and Z2. And the second example, which is this, projected onto Z1 and Z2, becomes these two numbers. If you were to reconstruct the original data, roughly this is Z1 and this is Z2. Then the first training example, which was at 1, 1, has a distance of 1.38 on the Z1 axis, hence this number, and a distance here of 0.29, hence this distance on the Z2 axis. And the reconstruction actually looks exactly the same as the original data, because if you reduce, or not really reduce, two-dimensional data to two-dimensional data, there is no approximation, and you can get back the original data set with the projections onto Z1 and Z2. So this is what the code to run PCA looks like. I hope you take a look at the optional lab where you can play with this more yourself. And also try varying the parameters and look at a specific example to deepen your intuition about how PCA works. Before wrapping up, I'd like to share a little bit of advice for applying PCA. PCA is frequently used for visualization where you reduce data to two or three numbers so you can plot it, like you saw in an earlier video with the data on different countries, so you can visualize different countries. There are some other applications of PCA that you may occasionally hear about that used to be more popular maybe 10, 15, 20 years ago, but much less so now. Another possible use of PCA is data compression. For example, if you have a database of lots of different cars and you have 50 features per car, but it's just taking up too much space on your database or maybe transmitting 50 numbers over the internet, you know, just takes too long, then one thing you could do is reduce these 50 features to a smaller number of features, and it could be 10 features, with 10 axes or 10 principal components. You can't visualize 10-dimensional data that easily, but this is one-fifth of the storage space or maybe one-fifth of the network transmission costs needed. And so many years ago, I saw PCA used for this application more often, but today with modern storage being able to store pretty large data sets and modern networking, able to transmit faster and more data than ever before, I see this use much less often as an application of PCA. One other application of PCA that again used to be more common maybe 10 years ago, 20 years ago, but much less so now, is using it to speed up training of a supervised learning model, where the idea is if you had 1,000 features and having 1,000 features made the supervised learning algorithm run too slowly, maybe you can reduce it to 100 features using PCA, and then your data set is basically smaller and your supervised learning algorithm may run faster. This used to make a difference in the running time of some of the older generations of learning algorithms, such as if you've heard of a support vector machine, this will speed up a support vector machine. But it turns out with modern machine learning algorithms, algorithms like deep learning, this doesn't actually help that much. And it's much more common to just take the high-dimensional data set and feed it into, say, your neural network, rather than run PCA, because PCA has some computational costs as well. So you may hear about this in some of the older research papers, but I don't really see this done much anymore. But the most common thing that I use PCA for today is visualization, and I find it very useful to reduce the dimension of data to visualize it. So thanks for sticking with me through the end of the optional videos for this week. I hope you enjoyed learning about PCA and that you find it useful when you get a new data set for reducing the dimension of your data set to two or three dimensions so you can visualize it and hopefully gain new insights into your data sets. It's helped me many times understand my own data sets, and I hope that you find it equally useful as well. Thanks for watching these videos, and I look forward to seeing you next week.