The ability of systems like ChatGPT and Bard to generate text seems almost magical. They do represent a big step forward for AI technology. But how does text generation actually work? In this video, we'll take a look at what actually underlies the generative AI technology, and this will hopefully, help you understand what you can use it for and also when you might not want to count on it. Let's take a look. Let's start by looking at where generative AI fits within the AI landscape. There's a lot of buzz and excitement and also hype about AI, and I think a useful way to think of AI is as a collection or as a set of tools. One of the most important tools in AI is supervised learning, which turns out to be really good at labeling things. Don't worry if you don't know what this means, we'll talk more about it on the next slide. Second to it that started to work really well only fairly recently is generative AI. If you study AI, you may recognize that there are other tools as well, such as things called unsupervised learning and reinforcement learning. But for the purposes of this course, I'm going to touch briefly on what is supervised learning and then spend most of our time talking about generative AI. These two, supervised learning and generative AI, are the two most important tools in AI today. For most business use cases, you should be fine if you just not worry about the other tools than these for now. Before describing how generative AI works, let me briefly describe what is supervised learning because it turns out generative AI is built using supervised learning. Supervised learning is the technology that has made computers very good when given an input, which I'm going to call A, to generate a corresponding output, which I'm going to call B. Look at a few examples. Given an email, supervised learning can decide if that email is spam or not. The input A is an email and the output B is either zero or one, where zero is not spam and one is spam. This is how spam filters work today. As a second example, probably the most lucrative application, not the most inspiring, but lucrative for some companies that I've ever worked on was online advertising, where given an ad and some information about a user, an AI system can generate an output B corresponding to whether or not you're likely to click on that ad. By showing slightly more relevant ads, this drives significant revenue for the online ad platforms. In self-driving cars and in driver assistance systems, supervised learning is used to take as input a picture of what's in front of your car and radar info and label that with the position of other cars. Give it a medical x-ray, it can try to label that with a medical diagnosis. I've also done a lot of work in manufacturing defect inspection where you can have a system take a picture of a phone as it rolls off the assembly line and check if the phone has any scratches or other defects, or in speech recognition, the input A would be a piece of audio and we would label that with the text transcript, or as a final example, if you run a restaurant or some other business where occasionally you have reviews written about your business or your products, supervised learning can read those reviews and label each one as having either a positive or a negative sentiment. This is useful for reputation monitoring of the business. It turns out the decade of around 2010-2020 was a decade of large-scale supervised learning. I want to touch on this briefly because it turns out this laid the foundation for modern generative AI. But what we found starting around 2010 was that for a lot of applications, we had a lot of data, but even as we fed it more data, its performance wasn't getting that much better if we were training small AI models. This means, for example, if you were building a speech recognition system, even as your AI listened to tens of thousands or hundreds of thousands of hours of data, that's a lot of data, it didn't get that much more accurate compared to a system that listened to only a smaller amount of audio data. But what more and more researchers started to realize through this period is if you were to train a very large AI model, meaning an AI model on very fast, very powerful computers with a lot of memory, then its performance as you fed it more and more data will just keep on getting better and better. In fact, years ago when I started and led the Google Brain team, the primary mission that I set for the Google Brain team in the early days was I said, let's just build really, really large AI models and feed them a lot of data. And fortunately, that recipe worked and ended up driving a lot of AI progress at Google. Large-scale supervised learning remains important today, but this idea of very large models for labeling things is how we got to generative AI today. Let's look at how generative AI generates text using a technology called large language models. Here's one way that large language models, which I will abbreviate LLM, can generate text. Given an input like, I love eating, this is called a prompt, an LLM can then complete this sentence with maybe bagels with cream cheese, or if you run it a second time, it might say, my mother's meatloaf, or if you run it a third time, maybe it'll say out with friends. How does an LLM, a large language model, generate this output? It turns out that LLMs are built by using supervised learning. That's a technology to input A and output a label B. It uses supervised learning to repeatedly predict what is the next word. For example, if an AI system has read on the Internet a sentence like, my favorite food is a bagel with cream cheese, then this one sentence will be turned into a lot of data points for it to try to learn to predict the next word. Specifically, given this sentence, we now have one data point that says, given the phrase, my favorite food is a, what do you think is the next word? In this case, the right answer is bagel. Also, given my favorite food is a bagel, what do you think is the next word? It's with, and so on. This one sentence is turned into multiple inputs A and outputs B for it to try to learn from where the LLM is learning given a few words to predict what is the next word that comes after. When you train a very large AI system on a lot of data, a lot of data for LLMs means hundreds of billions of words, and in some cases, more than a trillion words, then you get a large language model like ChatGPT that's given a prompt is very good at generating some additional words in response to that prompt. For now, I'm omitting some technical details. Specifically, next week, we'll talk about a process that makes LLMs not just predict the next word, but actually learn to follow instructions and also be safe in what it outputs. But at the heart of LLMs is this technology that's learned from a lot of data to predict what is the next word. That's how large language models work; they're trained to repeatedly predict the next word. It turns out that many people, perhaps including you, are already finding these models useful for day-to-day activities at work to help with writing, to find basic information, or to be a thought partner to help think things through. Let's take a look at some examples in the next video.