So, welcome Andre, I'm really glad you could join me today. Yeah, thank you for having me. So a lot of people already know your work in deep learning, but not everyone knows your personal story. So I'd like to ask, and start by telling us, how did you end up doing all this work in deep learning? Yeah, absolutely. So I think my first exposure to deep learning was when I was an undergraduate at the University of Toronto. And so Geoff Hinton was there and he was teaching a class on deep learning. And at the time it was restricted Boltzmann machines trained on MNIST digits. And I just really liked the way Geoff talked about training the network, like the mind of the network, and he was using these terms. And I just thought there was a kind of a flavor of something magical happening when this was training on those digits. And so that's kind of like my first exposure to it, although I didn't get into it in a lot of detail at that time. And then when I was doing my master's degree at the University of British Columbia, I took a class with nanodifferators, and that was, again, on machine learning. And that's the first time I kind of delved deeper into these networks and so on. And kind of what was interesting is that I was very interested in artificial intelligence, and so I took classes in artificial intelligence. But a lot of what I was seeing there was just very not satisfying. Like it was a lot of kind of depth-first search, breadth-first search, alpha-beta pruning and all these things. And I was like not understanding how, like I was not satisfied. And so when I was seeing neural networks for the first time, like in machine learning, which is kind of this term that I think is more technical and not as well known in kind of, you know, most people talk about artificial intelligence. Machine learning was more kind of a technical term, I would almost say. And so I was dissatisfied with artificial intelligence, and when I saw machine learning, I was like, this is the AI that I want to kind of spend time on. This is what's really interesting. And that's kind of what took me down those directions, is that this is kind of almost a new computing paradigm, I would say, because normally humans write code. But here in this case, the optimization writes code. And so you're creating input-output specification, and then you have lots of examples of it, and then the optimization writes code, and sometimes it can write code better than you. And so I thought that was just a very new kind of way of thinking about programming, and that's what's kind of intriguing about it. Then through your work, one of the things you've come to be known for is that you're now the human benchmark for the ImageNet image classification competition. How did that come about? So basically, the ImageNet challenge is sometimes compared to the World Cup of computer vision. So a lot of people kind of care about this benchmark, and our error rate kind of goes down over time. And it was not obvious to me where a human would be on this scale. And I've done a similar smaller-scale experiment on CIFAR-10 data set earlier. So what I did in CIFAR-10 is I just was looking at these 32 by 32 images, and I was trying to classify them myself. At the time, this was only 10 categories, so it was fairly simple to create an interface for it. And I think I had an error rate of about 6% on that. And then based on what I was seeing and how hard the task was, I think I predicted that the lowest error rate we'd achieve would be like, okay, I can't remember the exact numbers. I think I guess like 10%. And we're now down to like 3% or 2% or something crazy. So that was my first kind of fun experiment of human baseline. And I thought it was really important for the same purposes that you kind of point out in some of your lectures. I mean, you really want that number to understand how well humans are doing. And so you can compare machine learning algorithms to it. And for ImageNet, it seemed that there was a discrepancy between how important this benchmark was and how much focus there was on getting a lower number, and us not understanding even how humans are doing on this benchmark. And so I created this JavaScript interface, and I was showing myself the images. And then the problem with ImageNet is you don't have just 10 categories, you have 1,000. And so it was almost like a UI challenge of, obviously, I can't remember 1,000 categories. So how do I make it so that it's something fair? And so I listed out all the categories, and I gave myself examples of them. And so for each image, I was scrolling through 1,000 categories, and just trying to kind of see, based on the examples I was seeing for each category, what this image might be. And I thought it was just an extremely instructive exercise by itself. I mean, I did not understand that like a third of ImageNet is dogs and like dog species. And so that was kind of interesting to see that the network spends a huge amount of time caring about dogs, I think. I think a third of its performance comes from dogs. And yeah, so this was kind of something that I did for maybe a week or two. I put everything else on hold. I thought it was kind of a very fun exercise. I got a number in the end. And then I thought that one person is not enough. I wanted to have multiple other people. And so I was trying to organize within the lab to get other people to kind of do the same thing. And I think people are not as willing to contribute, say, like a week or two of like pretty painstaking work. You know, just like, yeah, sitting down for like five hours and trying to figure out which dog breed this is. And so I was not able to get like enough data in that respect. But we got at least like some approximate performance, which I thought was fun. And then this was kind of picked up and it wasn't obvious to me at the time. I just wanted to know the number. But this became like a thing. And people really like the fact that this happened. And I'm referred to jokingly as like the reference human. And of course, that's kind of hilarious to me. Yeah. Were you surprised when, you know, software deep nets finally surpass your performance? Absolutely. So, yeah, absolutely. I mean, especially, I mean, sometimes it's really hard to see in the image where it is. It's just like a tiny blob of like a black, black dog is obviously somewhere there. And I'm not seeing like, you know, I'm guessing between like 20 categories and the network just gets it. I don't understand how that comes about. So there's some superhuman to it. But also for the I think the network is extremely good at these kind of statistics of like four types and textures. And it's just I think in that respect, I was not surprised that the network could better measure those fine statistics across lots of images. In many cases, I was surprised because some of the images require you to read like it's just a bottle and you can't see what it is, but actually tells you what it is in text. And so as a human, I can read it and it's fine. But the network would have to learn to read to identify the object because it wasn't obvious just from it. You know, one of the things you've become well known for and that the deep learning committee has been grateful to you for has been your teaching the standard class and putting that online. Tell me a bit about how that came about. Yeah, absolutely. So I think I felt very strongly that basically this technology was transformative and that a lot of people want to use it. It's almost like a hammer. And what I wanted to do, I was in a position to randomly kind of hand out this hammer to a lot of people. And I just found that very compelling. It's not like necessarily advisable from the perspective of a PhD student because you're putting your research on hold. I mean, this became like 120 percent of my time and I had to put all of research on hold for maybe I mean, I taught the class twice and each time it's maybe four months. And so that time is basically spent entirely on the class. So it's not super advisable from that perspective. But it was basically the highlight of my PhD is not even like related to research. I think teaching the class was definitely the highlight of my PhD. Just seeing the students, just the fact that they were really excited. It was a very different class. Normally you're being taught things that were discovered in 1800 or something like that. But we were able to come to class and say, look, there's this paper from like a week ago or even like yesterday and there's new results. And I think the undergraduate students and the other students, they just really enjoyed that aspect of the class and the fact that they actually understood. So there's not you know, so you don't have to this is not nuclear physics or rocket science. This is like you need to know calculus and linear algebra and you can actually kind of understand everything that happens under the hood. And so I think just the fact that it's so powerful, the fact that it's that keeps changing on a daily basis, people kind of felt like they're on the forefront of something big. And I think that's why people like really enjoyed that class a lot. Yeah. And you've really helped a lot of people and had a lot of hammers. You know, as someone that's been doing deep learning for quite some time now, the field is evolving rapidly. I'd be curious to hear how has your own thinking, how has your understanding of deep learning changed over these many years? Yeah, it's basically like when I was seeing restrictables for machines for the first time on digits, it wasn't obvious to me how this technology was going to be used and how big of a deal it would be. And also, when I was starting to work in computer vision, convolutional networks, they were around, but they were not something that a lot of the computer vision community kind of anticipated using anytime soon. I think the perception was that this works for small cases, but would never scale to large images. And that was just extremely incorrect. And so basically, I'm just surprised by how general technology is and how good the results are. That was my largest surprise, I would say. And it's not only that. So that's one thing that it works so well and say like image, but the other thing that I think no one saw coming, or at least for sure I did not see coming, is that you can take these pre-trained networks and that you can transfer, you can find them on arbitrary other tasks. Because now you're not just solving ImageNet and you need millions of examples. This also happens to be very general feature extractor. And I think that's kind of a second insight that I think fewer people saw coming. And there were these papers that are just like, here are all the things that people have been working on in computer vision. Scene classification, action recognition, object recognition, face attributes, and so on. And people are just kind of crushing each task just by fine tuning the network. And so that to me was very surprising. And somehow I guess supervised learning gets most of the press. And even though pre-training, fine tuning, transfer learning is actually working very well. People seem to talk less about that for some reason. Right. Yeah, exactly. Yeah, I think what has not worked as much is some of these hopes around unsupervised learning, which I think has kind of been really why a lot of researchers have gotten into the field in around 2007 and so on. And I think the promise of that has still not been delivered. And I think I find that also surprising, is that the supervised learning part worked so well and the unsupervised learning is still kind of in a state of, yeah, it's still not obvious how it's going to be used or how that's going to work, even though a lot of people are still deep believers, I would say, to use the term in this area. So I know that one of the presidencies has been thinking a lot about the long-term future of AI. Do you want to share your thoughts on that? So I spent the last maybe a year and a half at OpenAI kind of thinking a lot about these topics. And it seems to me like the field will kind of split into two trajectories. One will be kind of applied AI, which is kind of just making these neural networks, training them mostly with supervised learning, potentially unsupervised learning, and getting better, maybe image recognizers or something like that. And I think the other will be kind of artificial general intelligence directions, which is kind of how do you get neural networks that are entire kind of dynamical system that thinks and speaks and can do everything that a human can do and is intelligent in that way. And I think that what's been interesting is that, for example, in computer vision, the way we approached it in the beginning, I think, was wrong in that we tried to break it down by different parts. So we were like, OK, humans recognize people, humans recognize scenes, human recognize objects. So we're just going to do everything that humans do. And then once we have all those things, and now we have like different areas, and once we have all those things, we're going to figure out how to put them together. And I think that was kind of a wrong approach. And we've seen that, how that kind of played out historically. And so I think there's something similar that's going on slightly on a higher level with AI. So kind of people are asking, well, OK, people plan, people do experiments to figure out how the world works, or people talk to other people. So we need language. And people are trying to decompose it by function, accomplish each piece, and then put it together into some kind of brain. And I just think it's kind of just an incorrect approach. And so what I've been a much bigger fan of is having, not decomposing that way, but having a single kind of neural network that is the complete dynamical system that you're always working with, a full agent. And then the question is, how do you actually create objectives such that when you optimize over the weights to make up that brain, you get intelligent behavior out? And so that's kind of been something that I've been thinking about a lot at OpenAI. I think there are a lot of kind of different ways that people have thought about approaching this problem. So for example, going in a supervised learning direction, I have this essay online. It's not an essay. It's kind of a short story that I wrote. And the short story kind of tries to come up with a hypothetical world of what it might look like if the way we approach this AGI is just by scaling up supervised learning, which we know works. And so that gets into something that looks like Amazon Mechanical Turk, where people association to lots of robot bodies and they perform tasks. And then we train on that as a supervised learning data set to imitate humans and what that might look like and so on. And so then there are other directions like unsupervised learning from algorithmic information theory, things like AIXI or from artificial life, things that look more like artificial evolution. And so that's kind of where I spend my time thinking a lot about. And I think I have the correct answer, but I'm not willing to reveal it here. So you can at least learn more by reading your blog post. Yeah, absolutely. So you've already given out a lot of pamphlets, and today there are a lot of people still wanting to enter the field of AI, into deep learning. So for people in that position, what advice do you have for them? Yeah, absolutely. So I think when people talk to me about CS231 and why they thought it was a very useful course, what people, what I keep hearing again and again is just people appreciate the fact that we got all the way to the low-level details and they were not working with a library. They saw the raw code and they saw how everything was implemented and implemented chunks of it themselves. And so just going all the way down to the, and understanding everything under you and never, it's really important to not abstract away things like you need to have a full understanding of the whole stack. And that's where I learned the most myself as well when I was learning this stuff is just implementing it myself from scratch was the most important. It was the piece that I felt gave me the best kind of a bang for the buck in terms of understanding. So I wrote my own library. It's called ComNetJS. It was written in JavaScript and it implements convolutional neural networks. That was my way of learning about backpropagation. And so that's something that I keep advising people is that you not work with TensorFlow or something else. You can work with it once you have written it something yourself on the lowest detail. You understand everything under you and now you are comfortable to, now it's possible to use some of these frameworks that abstract some of it away from you, but you know what's under the hood. And so that's been something that helped me the most. That's something that people appreciate the most when they take 231N and that's what I would advise a lot of people. So rather than, you know, run a neural network, it will all happen by magic. Yeah. And it's some kind of a sequence of layers. And I know that when I add some dropout layers, it makes it work better. Like that's not what you want. In that case, you're not going to be able to debug effectively. You're not going to be able to improve on models effectively. So with that answer, I'm really glad that the DeepLearning.ai course started off with many weeks of Python programming first and then the framework. Yeah. Good. Good. Thank you very much for sharing your insights and advice. You're already a hero to many people in the deep learning world. So really glad, really grateful you could join us here today. Yeah. Thank you for having me.

Deep Learning Specialization

Intermediate

Topics

Computer Vision

Deep Learning

NLP

Supervised Learning

Transformers

Collaborator

DeepLearning.AI

Week 1: ML Strategy

Introduction to ML Strategy

Why ML Strategy
Video
・
2 mins

Orthogonalization
Video
・
10 mins

Setting Up your Goal

Single Number Evaluation Metric
Video
・
7 mins

Satisficing and Optimizing Metric
Video
・
5 mins

Train/Dev/Test Distributions
Video
・
6 mins

Size of the Dev and Test Sets
Video
・
5 mins

When to Change Dev/Test Sets and Metrics?
Video
・
11 mins

Comparing to Human-level Performance

Why Human-level Performance?
Video
・
5 mins

Avoidable Bias
Video
・
6 mins

Understanding Human-level Performance
Video
・
11 mins

Surpassing Human-level Performance
Video
・
6 mins

Improving your Model Performance
Video
・
4 mins

Lecture Notes (Optional)

Lecture Notes W1
Reading
・
1 min

Machine Learning Flight Simulator (Quiz)

Machine Learning Flight Simulator (Introduction to the Quizzes)
Reading
・
2 mins

Bird Recognition in the City of Peacetopia (Quiz Case Study)

Graded・Quiz

・

1 hour 15 mins

Heroes of Deep Learning (Optional)

Andrej Karpathy Interview
Video
・
15 mins

Week 2: ML Strategy