The term human-level performance is sometimes used casually in research articles, but let me show you how we can define it a bit more precisely, and in particular, use the definition of the phrase human-level performance that is most useful for helping you drive progress in your machine learning project. So remember from our last video that one of the uses of this phrase human-level error is that it gives us a way of estimating Bayes' error, what is the best possible error any function could either now or in the future ever, ever achieve. So bearing that in mind, let's look at a medical image classification example. Let's say that you want to look at a radiology image like this and make a diagnosis classification decision. And suppose that a typical human, untrained human, achieves 3% error on this task. A typical doctor, maybe a typical radiologist doctor, achieves 1% error. An experienced doctor does even better, 0.7% error. And a team of experienced doctors, that is if you get a team of experienced doctors and have them all look at the image and discuss and debate the image, together their consensus opinion achieves 0.5% error. So the question I want to pose to you is how should you define human-level error? Is human-level error 3%, 1%, 0.7%, or 0.5%? Feel free to pause this video to think about it if you wish. And to answer that question, I would urge you to bear in mind that one of the most useful ways to think of human error is as a proxy or an estimate for Bayes' error. So please feel free to pause this video to think about it for a while if you wish. But here's how I would define human-level error, which is if you want a proxy or an estimate for Bayes' error, then given that a team of experienced doctors discussing and debating can achieve 0.5% error, we know that Bayes' error is less than or equal to 0.5%. So because some system, a team of doctors can achieve 0.5% error, so by definition the theoretically optimal error has got to be 0.5% or lower. We don't know how much better it is. Maybe there's an even larger team of even more experienced doctors that could do even better. So maybe it's even a little bit better than 0.5%. But we know the optimal error cannot be higher than 0.5%. So what I would do in this setting is use 0.5% as our estimate for Bayes' error. So I would define human-level performance as 0.5%. At least if you're hoping to use human-level error in the analysis of Bayes' invariance as we saw in the last video. Now for the purpose of publishing a research paper or for the purpose of deploying a system, maybe there's a different definition of human-level error that you can use, which is so long as you surpass the performance of a typical doctor, that seems like maybe a very useful result to have accomplished, and maybe surpassing a single radiologist, a single doctor's performance, might mean the system is good enough to deploy in some context. So maybe the takeaway from this is to be clear about what your purpose is in defining the term human-level error. And if it is to show that you can surpass a single human and therefore argue for deploying your system in some context, maybe this is the appropriate definition. But if your goal is the proxy for Bayes' error, then this is the appropriate definition. To see why this matters, let's look at an error analysis example. Let's say for a medical imaging diagnosis example, that your training error is 5% and your death error is 6%. And in the example from the previous slide, our human-level performance, and I'm going to think of this as proxy for Bayes' error. Depending on whether you defined it as a typical doctor's performance, an experienced doctor, or a team of doctors, you would have either 1% or 0.7% or 0.5% for this. And remember also our definitions from the previous video, that there's a gap between Bayes' error or the estimate of Bayes' error and training error. I was calling that a measure of the avoidable bias. And this as a measure or an estimate of how much of a variance problem you have in your learning algorithm. So in this first example, whichever of these choices you make, the measure of avoidable bias will be something like 4%. It will be somewhere between, I guess, 4% if you take that to 4.5%, if you use 0.5%, whereas this is 1%. So in this example, I would say it doesn't really matter which of the definitions of human-level error you use, whether you use the typical doctor's error, or the single experienced doctor's error, or the team of experienced doctor's error. Whether this is 4% or 4.5%, this is clearly bigger than the variance problem. And so in this case, you should focus on bias reduction techniques, such as train a bigger network. Now let's look at a second example. Let's say your training error is 1% and your dev error is 5%. Then again, it doesn't really matter. It seems a bit academic whether the human-level performance is 1% or 0.7% or 0.5%. Because whichever of these definitions you use, your measure of avoidable bias will be, I guess, somewhere between 0%, if you use that to 0.5%. That's the gap between the human-level performance and your training error, whereas this gap is 4%. So this 4% is going to be much bigger than the avoidable bias either way. And so this suggests you should focus on variance reduction techniques, such as regularization or getting a bigger training set. But where it really matters will be if your training error is 0.7%. So you're doing really well now. And your dev error is 0.8%. In this case, it really matters that you use your estimate for base error as 0.5%. Because in this case, your measure of how much avoidable bias you have is 0.2%, which is twice as big as your measure for your variance, which is just 0.1%. And so this suggests that maybe both the bias and variance are both problems, but maybe the avoidable bias is a bit bigger of a problem. And in this example, 0.5%, as we discussed on the previous slide, was the best measure of base error. Because a team of human doctors could achieve that performance. If you use 0.7 as your proxy for base error, you would have estimated avoidable bias as, you know, pretty much 0%. And you might have missed that you actually should try to do better. On your training set. So I hope this gives a sense also of why making progress in machine learning problem gets harder as you achieve, or as you approach human level performance. In this example, once you've approached 0.7% error, unless you're very careful about estimating base error, you might not know how far away you are from base error, and therefore how much you should be trying to reduce avoidable bias. In fact, if all you knew was that a single typical doctor achieves 1% error, and it might be very difficult to know if you should be trying to fit your training set even better. And this problem arose only when you're doing very well on your problem already. Only when you're doing, you know, 0.7%, 0.8%, really close to human level performance. Whereas indeed, two examples on the left, when you were further away from human level performance, it was easier to tell if you should focus on bias or variance. So this is maybe an illustration of why as you approach human level performance, it's actually harder to tease out the bias and the variance effects, and therefore why progress on your machine learning project just gets harder as you're doing really well. So just to summarize what we've talked about, if you're trying to understand bias and variance where you have an estimate of human level error, for a task that humans can do quite well, you can use human level error as a proxy or as an approximation for Bayes' error. And so the difference between your estimate of Bayes' error tells you how much avoidable bias is a problem, how much avoidable bias there is, and the difference between training error and depth error that tells you how much variance is a problem. That's whether your algorithm is able to generalize from the training set to the depth set. And the big difference between our discussion here and what we saw in an earlier course was that instead of comparing training error to 0%, and just calling that the estimate of the bias, in contrast in this video, we have a more nuanced analysis in which there is no particular expectation that you should get 0% error, because sometimes Bayes' error is non-zero and sometimes it's just not possible for anything to do better than a certain threshold of error. And so in the earlier course, we're measuring training error and seeing how much bigger training error was than 0, and just using that to try to understand how big our bias is. And that turns out to work just fine for problems where Bayes' error is nearly 0%, such as recognizing cats. Humans are near perfect for that, so Bayes' error is also near perfect for that. So that actually works okay when Bayes' error is nearly 0. But for problems where the data is noisy, like speech recognition on very noisy audio, where it's just impossible sometimes to hear what was said and to get the correct transcription. For problems like that, having a better estimate for Bayes' error can help you better estimate avoidable bias and variance, and therefore make better decisions on whether to focus on bias reduction tactics or on variance reduction tactics. So to recap, having an estimate of human-level performance gives you an estimate of Bayes' error, and this allows you to more quickly make decisions as to whether you should focus on trying to reduce the bias or trying to reduce the variance of your algorithm. And these techniques will tend to work well until you surpass human-level performance, whereupon you might no longer have a good estimate of Bayes' error that still helps you make this decision really clearly. Now, one of the exciting developments in deep learning has been that for more and more tasks, we're actually able to surpass human-level performance. In the next video, let's talk more about the process of surpassing human-level performance.

Deep Learning Specialization

Intermediate

Topics

Computer Vision

Deep Learning

NLP

Supervised Learning

Transformers

Collaborator

DeepLearning.AI

Week 1: ML Strategy

Introduction to ML Strategy

Why ML Strategy
Video
・
2 mins

Orthogonalization
Video
・
10 mins

Setting Up your Goal

Single Number Evaluation Metric
Video
・
7 mins

Satisficing and Optimizing Metric
Video
・
5 mins

Train/Dev/Test Distributions
Video
・
6 mins

Size of the Dev and Test Sets
Video
・
5 mins

When to Change Dev/Test Sets and Metrics?
Video
・
11 mins

Comparing to Human-level Performance

Why Human-level Performance?
Video
・
5 mins

Avoidable Bias
Video
・
6 mins

Understanding Human-level Performance
Video
・
11 mins

Surpassing Human-level Performance
Video
・
6 mins

Improving your Model Performance
Video
・
4 mins

Lecture Notes (Optional)

Lecture Notes W1
Reading
・
1 min

Machine Learning Flight Simulator (Quiz)

Machine Learning Flight Simulator (Introduction to the Quizzes)
Reading
・
2 mins

Bird Recognition in the City of Peacetopia (Quiz Case Study)

Graded・Quiz

・

1 hour 15 mins

Heroes of Deep Learning (Optional)

Andrej Karpathy Interview
Video
・
15 mins

Week 2: ML Strategy