Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In the last video, you saw some of the wide range of applications to which you can apply sequence models. Let's start by defining a notation that we'll use to build up these sequence models. As a motivating example, let's say you want to build a sequence model to input a sentence like this, Harry Potter and Hermione Granger invented a new spell. And these are characters, by the way, from the Harry Potter sequence of novels by J.K. Rowling. And let's say you want a sequence model to automatically tell you where are the people's names in this sentence. So this is a problem called named entity recognition, and this is used by search engines, for example, to index all of, say, the last 24 hours news of all the people mentioned in the news articles so that they can index them appropriately. And named entity recognition systems can be used to find people's names, companies' names, times, locations, countries' names, currency names, and so on in different types of text. Now, given this input x, let's say that you want a model to output y that has one output per input word, and the target output, the desired y, tells you for each of the input words is that part of a person's name. And technically, this maybe isn't the best output representation. There are some more sophisticated output representations that tells you not just is a word part of a person's name, but tells you where the start and ends of people's names in a sentence. You might want to know Harry Potter starts here and ends here, starts here and ends here. But for this motivating example, I'm just going to stick with this simpler output representation. Now, the input is a sequence of nine words. So eventually, we're going to have nine sets of features to represent these nine words. And to index into the positions in the sequence, I'm going to use x and then superscript angle brackets, one, two, three, and so on up to x angle bracket nine to index into the different positions. I'm going to use x t with an index t to index into positions in the middle of the sequence. And t implies that these are temporal sequences, although whether the sequence is a temporal one or not, I'm going to use the index t to index into the positions in the sequence. And similarly, for the output, we're going to refer to these outputs as y angle bracket one, two, three, and so on up to y nine. Let's also use t subscript x to denote the length of the input sequence. So in this case, there are nine words. So t x is equal to nine, and we'll use t y to denote the length of the output sequence. In this example, t x is equal to t y, but as shown in the last video, t x and t y can be different. So you will remember that in the notation we've been using, we've been writing x round brackets i to denote the i-th training example. So to refer to the t-th element or the t-th element in the sequence of training example i, we use this notation. And if t x is the length of a sequence, then different examples in your training set can have different lengths. And so t x i will be the input sequence length for training example i. And similarly, y i t means the t-th element in the output sequence of the i-th training example, and t y i will be the length of the output sequence in the i-th training example. So in this particular example, t x i is equal to nine, but if you had a different training example with a sentence of 15 words, then t x i would be equal to 15 for that different training example. Now, this is our first serious foray into NLP or natural language processing. Let's next talk about how we would represent individual words in a sentence. So to represent the words in the sentence, the first thing you do is come up with a vocabulary, sometimes also called a dictionary, and that means making a list of the words that you will use in your representation. So the first word in the vocabulary is a, that's the first word of the dictionary. The second word is Aaron, and then a little bit further down is the word and, and then eventually you get to the word Harry, then eventually the word Potter, and then all the way down to maybe the last word in the dictionary is Zulu. And so a would be word one, Aaron is word two, and in my dictionary, the word and appears in position or index 367. Harry appears in position 4075, Potter at position 6830, and Zulu is the last word of the dictionary is maybe word 10,000. So in this example, I'm going to use a dictionary with size 10,000 words. This is quite small by modern NLP applications. For commercial applications, for reasonable size commercial applications, dictionary sizes of 30 to 50,000 are more common, and 100,000 is not uncommon. And then some of the large internet companies will use dictionary sizes that are maybe a million words or even bigger than that. But you see a lot of commercial applications use dictionary sizes of maybe 30,000 or maybe 50,000 words. But I'm going to use 10,000 for illustration, since it's a nice round number. So if you have chosen a dictionary of 10,000 words, and one way to build this dictionary would be to look through your training sets and find the top 10,000 occurring words or to look through some of the online dictionaries that tells you what are the most common 10,000 words in the English language, say. What you can do is then use one-hot representations to represent each of these words. For example, x1, which represents the word Harry, would be a vector with all zeros except for a 1 in position 4075, because that was the position of Harry in the dictionary. And then x2 will be again similarly a vector of all zeros except for a 1 in position 6830, and then zeros everywhere else. The word and was represented as a position 367, so x3 would be a vector with zeros, a 1 in position 367, and then zeros everywhere else. And each of these would be a 10,000 dimensional vector if your vocabulary has 10,000 words. And this one, a, I guess because a is the first word of the dictionary. Then x7, which corresponds to the word a, that would be the vector 1. This is the first element of the dictionary, and then zero everywhere else. So in this representation, xt for each of the values of t in your sentence will be a one-hot vector, one-hot because there's exactly one 1 that's on and zeros everywhere else, and you would have nine of them to represent the nine words in this sentence. And the goal is, given this representation for x, to learn a mapping using a sequence model to the target output y. And we'll do this as a supervised learning problem when we're given labeled data with both x and y. And just one last detail, which we'll talk more about in a later video, is what if you encounter a word that is not in your vocabulary? Well, the answer is you create a new token or a new fake word called unknown word, which I'll denote as follows, angle brackets u and k, to represent words not in your vocabulary. We'll come on to talk more about this later. So to summarize, in this video we described a notation for describing your training set for both x and y for sequence data. In the next video, let's start to describe a recurrent neural network for learning the mapping from x to y.