Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this video, we'll start to develop a second type of recommended system called a content-based filtering algorithm. To get started, let's compare and contrast the collaborative filtering approach that we've been looking at so far with this new content-based filtering approach. Let's take a look. With collaborative filtering, the general approach is that we would recommend items to you based on ratings of users who gave similar ratings as you. So we have some number of users give some ratings for some items, and the algorithm figures out how to use that to recommend new items to you. In contrast, content-based filtering takes a different approach to deciding what to recommend to you. A content-based filtering algorithm will recommend items to you based on the features of users and features of the items to find a good match. In other words, it requires having some features of each user as well as some features of each item, and it uses those features to try to decide which items and users might be a good match for each other. With a content-based filtering algorithm, you still have data where users have rated some items. So with content-based filtering, we'll continue to use Rij to denote whether or not user j has rated item i, and we'll continue to use Yij to denote the rating that user j has given item i, if it's defined. But the key to content-based filtering is that we'll be able to make good use of features of the user and of the items to find better matches than potentially a pure collaborative filtering approach might be able to. Let's take a look at how this works. In the case of movie recommendations, here are some examples of features. You may know the age of the user, or you may have the gender of the user. So this could be a one-hot feature, similar to what you saw when we were talking about decision trees, where you could have a one-hot feature with three values based on whether the user's self-identified gender is male or female or unknown. And you may know the country of the user. So if there are about 200 countries in the world, then this would be a one-hot feature with about 200 possible values. You can also look at past behaviors of the user to construct this feature vector. For example, if you look at the top 1,000 movies in your catalog, you might construct 1,000 features that tell you of the 1,000 most popular movies in the world, which of these has the user watched. And in fact, you can also take ratings the user might have already given in order to construct new features. So it turns out that if you have a set of movies, and if you know what genre each movie is in, then the average rating per genre that the user has given. So of all the romance movies that the user has rated, what was the average rating? Of all the action movies that the user has rated, what was the average rating? And so on for all the other genres. This too can be a powerful feature to describe the user. One interesting thing about this feature is that it actually depends on the ratings that the user has given, but there's nothing wrong with that. Constructing a feature vector that depends on the user's ratings is a completely fine way to develop a feature vector to describe that user. So with such features like these, you can then come up with a feature vector x subscript u, u stands for user, superscript j for user j. Similarly, you can also come up with a set of features for each movie or for each item, such as what was the year of the movie, what's the genre or genres of the movie of now. If there are critic reviews of the movie, you can construct one or multiple features to capture something about what the critics are saying about the movie. Or once again, you can actually take user ratings of the movie to construct a feature of, say, the average rating of this movie. This feature, again, depends on the ratings that users had given, but again, there's nothing wrong with that. You can construct a feature for a given movie that depends on the ratings the movie had received, such as the average rating of the movie. Or if you wish, you can also have average rating per country or average rating per user demographic and so on to construct other types of features of the movies as well. And so with this, for each movie, you can then construct a feature vector, which I'm going to denote x subscript m, m stands for movie, and superscript i for movie i. Given features like this, the task is to try to figure out whether a given movie i is going to be a good match for user j. Notice that the user features and the movie features can be very different in size. For example, maybe the user features could be 1,500 numbers and the movie features could be just 50 numbers, and that's okay too. In content-based filtering, we're going to develop an algorithm that learns to match users and movies. Previously, we were predicting the rating of user j on movie i as wj dot product of xi plus bj. In order to develop content-based filtering, I'm going to get rid of bj. It turns out this won't hurt the performance of the content-based filtering at all. Instead of writing wj for a user j and xi for a movie i, I'm instead going to just replace this notation with vj, u. This v here stands for a vector, it'll be a list of numbers computed for user j, and the u subscript here stands for user. Instead of xi, I'm going to compute a separate vector, subscript m, this stands for movie, and for movie i, it's what the superscript stands for. vj, u is a vector, it's a list of numbers computed from the features of user j, and vim is a list of numbers computed from the features, like the ones you saw in the previous slide of movie i. And if we're able to come up with an appropriate choice of these vectors, vj, u and vim, then hopefully the dot product between these two vectors will be a good prediction of the rating that user j gives movie i. Just to illustrate what a learning algorithm could come up with, if v, u, that is a user vector, turns out to capture the user's preferences, say it's 4.9, 0.1, and so on, a list of numbers like that. And the first number captures how much do they like romance movies, and then the second number captures how much do they like action movies, and so on. And at vm, the movie vector is 4.5, 0.2, and so on and so forth. With these numbers capturing how much is this a romance movie, how much is this an action movie, and so on. Then the dot product, which multiplies these lists of numbers element-wise and then takes a sum, hopefully will give a sense of how much this particular user will like this particular movie. So the challenge is, given features of a user, say xju, how can we compute this vector vju that represents succinctly or compactly the user's preferences, and similarly, given features of the movie, how can we compute vim? Notice that whereas xu and xm could be different in size, one could be a very long list of numbers, one could be much shorter list, v here have to be the same size, because if you want to take a dot product between vu and vm, then both of them have to have the same dimension, such as maybe both of these are, say, 32 numbers. So to summarize, in collaborative filtering, we had number of users give ratings of different items. In contrast, in content-based filtering, we have features of users and features of items, and we want to find a way to find good matches between the users and the items. And the way we're going to do so is to compute these vectors, vu for the users and vm for the items of the movies, and then take dot products between them to try to find good matches. How do we compute vu and vm? Let's take a look at that in the next video.