Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this video, we'll take a look at how you can use TensorFlow to implement the collaborative filtering algorithm. You might be used to thinking of TensorFlow as a tool for building neural networks, and it is. It's a great tool for building neural networks. And it turns out that TensorFlow can also be very helpful for building other types of learning algorithms as well, like the collaborative filtering algorithm. One of the reasons I like using TensorFlow for tasks like these is that for many applications, in order to implement gradient descent, say, you need to find the derivatives of the cost function. But TensorFlow can automatically figure out for you what are the derivatives of a cost function. All you have to do is implement the cost function, and without needing to know any calculus, without needing to take derivatives yourself, you can get TensorFlow with just a few lines of code to compute that derivative term that can then be used to optimize the cost function. Let's take a look at how all this works. You might remember this diagram here on the right from course one. This is exactly the diagram that we had looked at when we talked about optimizing W when we were working through our first linear regression example. And at that time, we had set B equals to zero. And so the model was just predicting f of x equals W dot x, and we wanted to find the value of W that minimizes the cost function J. So the way we were doing that was via a gradient descent update, which looks like this, where W gets repeatedly updated as W minus the learning rate alpha times the derivative term. If you were updating B as well, this is the expression you would use. But if you set B equals zero, you just forego the second update, and you keep on performing this gradient descent update until convergence. Sometimes computing this derivative or partial derivative term can be difficult, and it turns out that TensorFlow can help with that. Let's see how. I'm going to use a very simple cost function, J equals Wx minus one squared. So Wx is our simplified fW of x, and y is equal to one, and so this would be the cost function if we had f of x equals Wx, y equals one for the one training example that we have. What if we were not optimizing this with respect to B? So the gradient descent algorithm would repeat until convergence this update over here. It turns out that if you implement the cost function J over here, TensorFlow can automatically compute for you this derivative term, and thereby get gradient descent to work. I'll give you a high-level overview of what this code does. W equals tf.variable three takes the parameter W and initializes it to the value of three. Telling TensorFlow that W is a variable is how we tell it that W is a parameter that we want to optimize. I'm going to set x equals one, y equals one, and the learning rate alpha to be equal to 0.01, and let's run gradient descent for 30 iterations. So in this code, we'll do for error in range iterations, so for 30 iterations, and this is the syntax to get TensorFlow to automatically compute derivatives for you. TensorFlow has a feature called a gradient tape, and if you write this with tfr.gradienttape as tape f, this is compute f of x as W times x, and compute J as f of x minus y squared, then by telling TensorFlow how to compute the cost J, and by doing it with a gradient tape syntax as follows, TensorFlow will automatically record the sequence of steps, the sequence of operations needed to compute the cost J. And this is needed to enable automatic differentiation. Next, TensorFlow will have saved the sequence of operations in tape, in the gradient tape. And with this syntax, TensorFlow will automatically compute this derivative term, which I'm going to call djdw, and TensorFlow knows you want to take the derivative with respect to W, and W is the parameter you want to optimize because you had told it so up here, and because we're also specifying it down here. So now that you've computed derivatives, finally you can carry out this update by taking W and subtracting from it the learning rate alpha times that derivative term that we just got from up above. TensorFlow variables, tier variables, require special handling, which is why instead of setting W to be W minus alpha times the derivative in the usual way, we use this assign add function. But when you get to the practice lab, don't worry about it, we'll give you all the syntax you need in order to implement the collaborative filtering algorithm correctly. So notice that with the gradient tape feature of TensorFlow, the main work you need to do is to tell it how to compute the cost function J, and the rest of the syntax causes TensorFlow to automatically figure out for you what is that derivative. And with this, TensorFlow will start with finding the slope of this at 3, shown by this dashed line, take a gradient step, and update W, and compute the derivative again, and update W over and over until eventually it gets to the optimal value of W, which is at W equals 1. So this procedure allows you to implement gradient descent without ever having to figure out yourself how to compute this derivative term. This is a very powerful feature of TensorFlow called AutoDiff, and some other machine learning packages like PyTorch also support AutoDiff. Sometimes you hear people call this AutoGrad, the technically correct term is AutoDiff, and AutoGrad is actually the name of a specific software package for doing automatic differentiation, for taking derivatives automatically. But sometimes if you hear someone refer to AutoGrad, they're just referring to this same concept of automatically taking derivatives. So let's take this and look at how you can implement the collaborative filtering algorithm using AutoDiff. And in fact, once you can compute derivatives automatically, you're not limited to just gradient descent. You can also use a more powerful optimization algorithm like the Adam optimization algorithm. In order to implement the collaborative filtering algorithm in TensorFlow, this is the syntax you can use. We'll start with specifying that the optimizer is Keras optimizer's Adam with learning rate specified here, and then for, say, 200 iterations, here's the syntax as before, with tf.gradientate acetate. You need to provide code to compute the value of the cost function J. So recall that in collaborative filtering, the cost function J takes as input parentheses X, W, and B, as well as the ratings we normalize, so that's why I'm writing Y norm, rij specifying which values have a rating, number of users, or nu in annotation, number of movies, or an M in annotation just now, as well as the regularization parameter lambda. And if you can implement this cost function J, then this syntax will cause TensorFlow to figure out the derivatives for you. Then this syntax will cause TensorFlow to record the sequence of operations used to compute the cost, and then by asking it to give you grads equals tf.gradient, this will give you the derivative of the cost function with respect to X, W, and B, and finally, with the optimizer that we have specified up on top as the Adam optimizer, you can use the optimizer with the gradients that we just computed. And the zip function in Python is just a function that rearranges the numbers into an appropriate ordering for the applied gradients function. If you are using gradient descent for Kaggle filtering, recall that the cost function J would be a function of W, B, as well as X, and if you're applying gradient descent, you take the partial derivative with respect to W, and then update W as follows, and you'd also take the partial derivative of this with respect to B, and update B as follows, and similarly update the features X as follows, and you repeat until convergence. But as I mentioned earlier, with TensorFlow and AutoDiff, you're not limited to just gradient descent. You can also use a more powerful optimization algorithm like the Adam optimizer. The dataset you use in the practice lab is a real dataset comprising actual movies rated by actual people. This is the MovieLens dataset, and it's due to Harper and Konstan, and I hope you enjoy running this algorithm on a real dataset of movies and ratings, and see for yourself the results that this algorithm can get. So that's it. That's how you can implement the collaborative filtering algorithm in TensorFlow. If you're wondering why do we have to do it this way, why couldn't we use a dense layer and then model compile and model fit, the reason we couldn't use that old recipe is the collaborative filtering algorithm and cost function, it doesn't neatly fit into the dense layer or the other standard neural network layer types of TensorFlow. That's why we had to implement it this other way, where we would implement the cost function ourselves, but then use TensorFlow's tools for automatic differentiation, also called AutoDiff, and use TensorFlow's implementation of the Adam optimization algorithm to let it do a lot of the work for us of optimizing the cost function. If the model you have is a sequence of dense neural network layers or other types of layers supported by TensorFlow, then the old implementation recipe of model compile, model fit works. But even when it isn't, these tools in TensorFlow give you a very effective way to implement other learning algorithms as well. And so I hope you enjoy playing more with the collaborative filtering exercise in this week's practice lab, and if it looks like there's a lot of code and a lot of syntax, don't worry about it, make sure you have what you need to complete that exercise successfully. And in the next video, I'd like to also move on to discuss more of the nuances of collaborative filtering, and specifically, the question of how do you find related items, given one movie, what are other movies similar to this one? Let's go on to the next video.