DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Course Syllabus

This course is part of Deep Learning Specialization

DeepLearning.AI
    daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass

    Elevate Your Career with Full Learning Experience

    Unlock Plus AI learning and gain exclusive insights from industry leaders

    Access exclusive features like graded notebooks and quizzes
    Earn unlimited certificates to enhance your resume
    Starting at $1 USD/mo after a free trial – cancel anytime
If the basic technical ideas behind deep learning, behind neural networks, have been around for decades, why are they only just now taking off? In this video, let's go over some of the main drivers behind the rise of deep learning, because I think this will help you better spot the best opportunities within your own organization to apply these to. Over the last few years, a lot of people have asked me, Andrew, why is deep learning suddenly working so well? And when I'm asked that question, this is usually the picture I draw for them. Let's say we plot a figure where on the horizontal axis, we plot the amount of data we have for a task. And let's say on the vertical axis, we plot the performance of our learning algorithm, such as the accuracy of our spam classifier, or our ad click predictor, or the accuracy of our neural net for figuring out the position of other calls for our self-driving car. It turns out, if you plot the performance of a traditional learning algorithm, like support vector machine, or logistic regression, as a function of the amount of data you have, you might get a curve that looks like this, where the performance improves for a while as you add more data, but after a while, the performance pretty much plateaus. This is supposed to be a horizontal line, didn't draw that very well. You know, it was as if they didn't know what to do with huge amounts of data. And what happened in our society over the last 20 years, maybe, is that for a lot of problems, we went from having a relatively small amount of data to having, you know, often a fairly large amount of data. And a lot of this was thanks to the digitization of a society, where so much human activity is now in the digital realm. We spend so much time on the computers, on websites, on mobile apps, and activities on digital devices creates data. And thanks to the rise of inexpensive cameras built into our cell phones, accelerometers, all sorts of sensors in the Internet of Things, we also just have been collecting more and more and more data. So over the last 20 years, for a lot of applications, we just accumulated a lot more data, more than traditional learning algorithms were able to effectively take advantage of. And with neural networks, it turns out that if you train a small neural net, then its performance maybe looks like that. If you train a somewhat larger neural net, let's call this a medium-sized neural net, its performance is often even a little bit better. And if you train a very large neural net, then its performance often just keeps getting better and better. So a couple of observations. One is, if you want to hit this very high level of performance, then you need two things. First, often you need to be able to train a big enough neural network in order to take advantage of a huge amount of data. And second, you need to be out here on the x-axis. You do need a lot of data. So we often say that scale has been driving deep learning progress, and by scale, I mean both the size of the neural network, meaning just a neural network with a lot of hidden units, a lot of parameters, a lot of connections, as well as scale of the data. In fact, today, one of the most reliable ways to get better performance in a neural network is often to either train a bigger network or throw more data at it. And that only works up to a point because eventually you run out of data or eventually the neural network is so big that it takes too long to train. But just improving scale has actually taken us a long way in the world of deep learning. In order to make this diagram a bit more technically precise, let me just add a few more things. I wrote amount of data on the x-axis. Technically, this is amount of labeled data, where by labeled data, I mean training examples, we have both the input x and the label y. Oh, and to introduce a little bit of notation that we'll use later in this course, we're going to use lowercase alphabet m to denote the size of our training set. So the number of training examples is lowercase m, so that's the horizontal axis. A couple other details to this figure. In this regime of small training sets, the relative ordering of the algorithms is actually not very well defined. So if you don't have a lot of training data, it's often up to your skill at hand engineering features that determines performance. So it's quite possible that if someone training an SVM is more motivated to hand engineer features than someone training even a large neural net, maybe in this small training set regime, the SVM could do better. So in this region to the left of the figure, the relative ordering between the algorithms is not that well defined. And performance depends much more on your skill at hand engineering features and other little details of the algorithms. And it's only in this big data regime, very large training sets, very large m regime on the right, that we more consistently see large neural nets dominating the other approaches. And so if any of your friends ask you, why are neural nets taking off? I would encourage you to draw this picture for them as well. So I would say that in the early days, in the modern rise of deep learning, it was scale of data and scale of computation. Just our ability to train very large neural networks, either on a CPU or a GPU, that enabled us to make a lot of progress. But increasingly, especially in the last several years, we've been seeing tremendous algorithmic innovation as well. So I also don't want to understate that. Interestingly, many of the algorithmic innovations have been about trying to make neural networks run much faster. So as a concrete example, one of the huge breakthroughs in neural networks has been switching from a sigmoid function, which looks like this, to a ReLU function, which I talked about briefly in an earlier video, that looks like this. If you don't understand the details of what I'm about to say, don't worry about it. But it turns out that one of the problems of using sigmoid functions in machine learning is that there are these regions here where the slope of the function, where the gradient is nearly zero, and so learning becomes really slow. Because when you implement gradient descent and the gradient is zero, the parameters just change very slowly, and so learning is very slow. Whereas by changing what's called the activation function of a neural network to use this function called the ReLU function, or the rectified linear unit, R-E-L-U, the gradient is equal to one for all positive values of input. And so the gradient is much less likely to gradually shrink to zero. And the gradient here, the slope of this line is zero on the left, but it turns out that just by switching to the sigmoid function to the ReLU function has made an algorithm called gradient descent work much faster. And so this is an example of maybe a relatively simple algorithmic innovation, but ultimately the impact of this algorithmic innovation was it really helped computation. So there have actually been quite a lot of examples like this of where we change the algorithm because it allows our code to run much faster and this allows us to train bigger neural networks or to do so in a reasonable amount of time even when we have a large network or a lot of data. The other reason that fast computation is important is that it turns out the process of training neural network is very iterative. Often you have an idea for a neural network architecture, and so you implement your idea in code. Implementing your idea then lets you run an experiment which tells you how well your neural network does, and then by looking at it you go back to change the details of your neural network and then you go around this circle over and over. And when your neural network takes a long time to train, it just takes a long time to go around this cycle. And there's a huge difference in your productivity building effective neural networks when you can have an idea and try it and see if it works in 10 minutes or maybe at most a day versus if you have to train your neural network for a month which sometimes does happen because you get a result back in 10 minutes or maybe in a day you could just try a lot more ideas and be much more likely to discover a neural network that works well for your application. And so faster computation has really helped in terms of speeding up the rate at which you can get an experimental result back and this has really helped both practitioners of neural networks as well as researchers working in deep learning iterate much faster and improve your ideas much faster. And so all this has also been a huge boon to the entire deep learning research community which has been incredible at just inventing new algorithms and making non-stop progress on that front. So these are some of the forces powering the rise of deep learning but the good news is that these forces are still working powerfully to make deep learning even better. Take data, society is still throwing off more and more digital data or take computation with the rise of specialized hardware like GPUs and faster networking many types of hardware I'm actually quite confident that our ability to build very large neural networks from a sheer computation point of view will keep on getting better. And take algorithms, well the whole deep learning research community is still continuously phenomenal at innovating on the algorithms front. So because of this I think that we can be optimistic I'm certainly optimistic that deep learning will keep on getting better for many years to come. So with that, let's go on to the last video of this section where we'll talk a little bit more about what you learned from this course.
course detail
Next Lesson
Introduction to Deep Learning
    Welcome to the Deep Learning Specialization
  • Welcome
    Video
    ・
    5 mins
  • Introduction to Deep Learning
  • What is a Neural Network?
    Video
    ・
    7 mins
  • Supervised Learning with Neural Networks
    Video
    ・
    8 mins
  • Why is Deep Learning taking off?
    Video
    ・
    10 mins
  • About this Course
    Video
    ・
    2 mins
  • Frequently Asked Questions
    Reading
    ・
    10 mins
  • Lecture Notes (Optional)
  • Lecture Notes W1
    Reading
    ・
    1 min
  • Quiz
  • Introduction to Deep Learning

    Graded・Quiz

    ・
    50 mins
  • Heroes of Deep Learning (Optional)
  • Geoffrey Hinton Interview
    Video
    ・
    40 mins
  • Course Feedback
  • Forum
  • Certificate