We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
The rise of AI has been largely driven by one tool in AI called machine learning. In this video, you'll learn what is machine learning, so that by the end, you'll hopefully be able to start thinking how machine learning might be applied to your company or to your industry. The most commonly used type of machine learning is the type of AI that learns A to B, or input to output mappings, and this is called supervised learning. Let's see some examples. If the input A is an email and the output B you want is this email spam or not, 01, then this is the core piece of AI used to build a spam filter, or if the input A is an audio clip and the AI's job is to output the text transfer, then this is speech recognition. More examples. If you want to input English and have it output a different language, Chinese, Spanish, something else, then this is machine translation, or the most lucrative form of supervised learning of this type of machine learning may be online advertising, where all the large online ad platforms have a piece of AI that inputs some information about an ad and some information about you, and tries to figure out will you click on this ad or not, and by showing you the ads that you most likely click on, this turns out to be very lucrative. Maybe not the most inspiring application, but certainly having a huge economic impact today. Or if you want to build a self-driving car, one of the key pieces of AI is an AI that takes this input and image and some information from the radar or from other sensors, and outputs the position of other cars, so your self-driving car can avoid the other cars, or in manufacturing. I've actually done a lot of work in manufacturing where you would take as input a picture of something you've just manufactured, such as a picture of a cell phone coming off an assembly line. This is a picture of a phone, not a picture taken by a phone. And you want to output is there a scratch, or is there a dent, or is there some other defect on this thing you've just manufactured, and this is visual inspection, which is helping manufacturers reduce or prevent defects in the things that they're making. Supervised learning also lies at the heart of generative AI systems, like the ChatGPT and BotChatBots that generate text. These systems work by learning from huge amounts of text, say downloaded from the internet, so that when given a few words as the input, the model can predict the next word that comes after. These models, which are called Large Language Models, or LLMs, generate new text by repeatedly predicting what is the next word they should output. Given the widespread attention on LLMs, let's look briefly on the next slide in greater detail at how they work. Large Language Models are built by using supervised learning to train a model to repeatedly predict the next word. For example, if an AI system has read on the internet a sentence like, my favorite drink is lychee bubble tea, then the single sentence would be turned into a lot of A to B data points for the model to learn to predict the next word. Specifically, given this sentence, we now have one data point that says, given the phrase my favorite drink, what do you predict is the next word? In this case, the right answer is, is. And given my favorite drink is, what do you predict is the next word? And the correct answer is, lychee, and so on, until you have used all the words in the sentence. So, this one sentence is turned into multiple inputs A and outputs B for the model to learn, given a few words as input, what is the next word? When you train a very large AI system on a lot of data, say hundreds of billions or even over a trillion words, then you get a large language model like ChatGPT that, given an initial piece of text called a prompt, is very good at generating some additional words in response to that prompt. The description I presented here does omit some technical details, like how the model learns to follow instructions rather than just predict the next word found on the internet, and also how developers make the model less likely to generate inappropriate outputs, such as one that exhibit discrimination or hand out harmful instructions. If you're interested, you can learn more about these details in the course Generative AI for Everyone. At the heart of LLMs, though, is this technology that learns from a lot of data to predict what is the next word using supervised learning. So, in summary, supervised learning just learns input-output or A to B mappings. On one hand, input-output A to B seems quite limiting, but when you find the right application scenario, this turns out to be incredibly valuable. Now, the idea of supervised learning has been around for many decades, but it's really taken off in the last few years. Why is this? My friends ask me, hey Andrew, why is supervised learning taking off now? There's a picture I draw for them, and I want to show you this picture now, and you may be able to draw this picture for others that ask you the same question as well. Let's say on the horizontal axis, you plot the amount of data you have for a task. So for speech recognition, this might be the amount of audio data and transcripts you have. In a lot of industries, the amount of data you have access to has really grown over the last couple of decades, thanks to the rise of the internet, the rise of computers. A lot of what used to be, say, pieces of paper are now instead recorded on a digital computer. So we've just been getting more and more and more data. Now, let's say on the vertical axis, you plot the performance of an AI system. It turns out that if you use a traditional AI system, then the performance would grow like this, that as you feed it more data, its performance gets a bit better, but beyond a certain point, it did not get that much better. So it's as if your speech recognition system did not get that much more accurate, or your online advertising system didn't get that much more accurate at showing the most relevant ads, even as you show them more data. AI has really taken off recently due to the rise of neural networks and deep learning. I'll define these terms more precisely in a later video, so don't worry too much about what it means for now. But with modern AI, with neural networks and deep learning, what we saw was that if you train a small neural network, then the performance kind of looks like this, whereas if you feed it more data, performance keeps getting better for much longer. And if you train an even slightly larger neural network, say a medium-sized neural net, then the performance may look like that. And if you train a very large neural network, then the performance just kind of keeps on getting better and better. And for applications like speech recognition, online advertising, building self-driving cars, where having a high-performance, highly accurate speech recognition system is important, this has enabled these AI systems to get much better and make speech recognition products much more acceptable to users, much more valuable to companies and to users. Now here are a couple of implications of this figure. If you want the best possible levels of performance, your performance to be up here, to hit this level of performance, then you kind of need two things. One is it really helps to have a lot of data. So that's why sometimes you hear about big data. Having more data almost always helps. And the second thing is you want to be able to train a very large neural network. And so the rise of fast computers, including Moore's law, but also the rise of specialized processors such as graphics processor units or GPUs, which you hear more about in a later video, has enabled many companies, not just the giant tech companies, but many, many other companies to be able to train large neural nets on a large enough amount of data in order to get very good performance and drive business value. In fact, it was also this type of scaling, increasing the amount of data and the size of the models that was instrumental to the recent breakthroughs in training generative AI systems, including the large language models that we discussed just now. The most important idea in AI has been machine learning, especially supervised learning, which means A to B or input-output mappings. What enables it to work really well is data. In the next video, let's take a look at what is the data and what data you might already have and how to think about feeding this into AI systems. Let's go on to the next video.