Your subscription plan will change at the end of your current billing period. Youโll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial โ cancel anytime
Hi Yoshua, I'm really glad you could join us here today. I'm very glad too. You know, today you're not just a researcher or engineer in deep learning, you've become one of the institutions or one of the icons of deep learning. But I'd really like to hear the story of how it started. So how did you end up, you know, getting into deep learning and then pursuing this journey? Right. Well, actually it started when I was a kid, adolescent, reading a lot of science fiction, like I guess many of us. And when I started my graduate studies in 1985, I started reading, you know, net papers. And that's where I got all excited and it became really a passion. And actually, what was that like in what, mid 80s, right? 1985, reading these papers. Do you remember? Yeah, it was, well, you know, coming from the courses I had taken in classical AI with expert systems and suddenly discovering that there was all this world of thinking about how humans might be learning and human intelligence and how we might draw connections between that and artificial intelligence and computers. That was really exciting for me when I discovered this literature and I started reading The Connectionists, of course. So the papers from Geoff Hinton, Bruno Hart, and so on. And I worked on recurrent nets. I worked on speech recognition. I worked on HMM, so graphical models. And then quickly moved to AT&T, Bell Labs, and MIT, where I did postdocs. And that's where I discovered some of the issues with the long-term dependencies with training neural nets. And then shortly after, I got recruited at UDM back in Montreal, where I had spent most of my adolescence years. So as someone who's been there for the last several decades and seen it all, certainly seen a lot of it, tell me about how your thinking about deep learning about neural networks has evolved over this time. We started with experiments, with intuitions, and theory sort of comes later. We now understand a lot better, for example, why backdrop is working so well, why depth is so important. And these kinds of notions, we didn't have any solid justification for in those days. When we started working on deep nets in the early 2000s, we had the intuition that it made a lot of sense that a deeper network should be more powerful, but we didn't know how to take that and prove it. And of course, our experiments initially didn't work. And actually, what were the most important things that you think turned out to be right, and what were the biggest surprises of what turned out to be wrong, compared to what we knew 30 years ago? Oh, sure. So one of the biggest mistakes I made was to think that, like everyone else in the 90s, that you needed smooth nonlinearities for backdrop to work. Because I thought that if we had something like the rectifying nonlinearities, where you have a flat part, that it would be really hard to train because the derivative would be zero in so many places. And when we started experimenting with ReLU with deep nets in around 2010, I was obsessed with the idea that, oh, we should be careful about whether neurons won't saturate too much on the zero part. But in the end, it turned out that actually the ReLU was working a lot better than the sigmoids and attach. And that was a big surprise. We did this, exploring this because of the biological connection, actually, not because we thought that it would be easier to optimize, but it turned out to work better, whereas I thought it would be harder to train. So let me ask you, what is the relationship between deep learning and the brain? That's the obvious answer, but I'm curious, what's your answer to that? Ah, well, the initial insight that really got me excited with neural nets was this idea from the connectionists that information is distributed across the activation of many neurons, rather than being represented by sort of the grandmother cell, as they were calling it, where a symbolic representation. That was the traditional view in classical AI. And I still believe this is a really important thing. And I see people rediscovering the importance of that even recently. So that was really a foundation. The depth thing is something that came later in the early 2000s, but it wasn't something I was thinking about in the 90s, for example. I remember you built a lot of relatively shallow, but very distributed representations of the word embeddings very early on. That's right. Yeah, that's one of the things that I got really excited about in the late 90s with actually my brother, Sandy, and I worked on the idea that we could use neural nets to tackle the curse of dimensionality, which was believed to be one of the central issues with statistical learning. And the fact that we could have these distributed representations could be used to represent joint distributions over many random variables in a very efficient way. And it turned out to work quite well. And then I extended this to joint distributions over sequences of words. And this is how the word embeddings were born, because I thought, oh, this will allow generalization across words that have similar semantic meaning and so on. So over the last couple of decades, your research group has invented more ideas than anyone can summarize in a few minutes. So I'm curious, what are the inventions or ideas you're most proud of from your group? Right. So I think I mentioned long term dependencies, the study of that, I think people still don't understand it well enough. Then there's the story I mentioned about curse of dimensionality, joint distributions with neural nets, which became more recently the NAID that Hugo and Rochelle did. And then, as I said, that gave rise to also the work on learning word embeddings for joint distributions for words. Then came, I think, probably the best known events of the work we did with deep learning with stacks of autoencoders and stacks of RBMs. One thing then, it was the work on understanding better the difficulties of training deep nets with the initialization ideas and also the vanishing gradient in deep nets. And that work actually was the one which gave rise to the experiments showing the importance of piecewise linear activation functions. Then I would say some of the most important work regards the work we did with unsupervised learning, the denoising autoencoders, the GAMs, which are very popular these days, the narrative adversarial networks. The work we did with neural machine translation using attention, which turned out to be really important for making translation work and is currently used in industrial systems like Google Translate. But this attention thing actually really changed my views on neural nets. Neural nets, we used to think as machines that can map a vector to a vector. But really with attention mechanisms, you can now handle any kind of data structure. And this is really opening up a lot of interesting avenues. Direction of actually connecting to biology, one thing that I've been working on in the last couple of years is how could we come up with something like backprop but that brains could implement. And we have a few papers in that direction that seems to be interesting for the neuroscience people. And then we're continuing on that direction, of course. One of the topics that I know you've been thinking a lot about is the relationship between deep learning and the brain. Can you tell us a bit more about that? The biological thing is something I've been thinking about for a while, actually, and having a lot of, I would say, daydreaming about because it's like I think of it like a puzzle. So we have these pieces of evidence from what we know from the brain and from learning in the brain, like spike timing dependent plasticity. And on the other hand, we have all of these concepts from machine learning, the idea of globally training the whole system with respect to an objective function and the idea of backprop. And what does backprop mean? Like, what does credit assignment really mean? When I started thinking about how brains could do something like backprop, it prompted me to think about, well, maybe there's like some more general concepts behind backprop, which make it so efficient, which allow us to be efficient with backprop. Maybe there's a larger family of ways to do credit assignment that connects to questions that people in reinforcement learning have been asking. So it's interesting how sometimes asking a simple question leads you to thinking about so many different things and forces you to think about so many elements that you'd like to bring together like a big puzzle. So this has gone for a number of years. I need to say that this whole endeavor, like many of the ones that I've followed, has been highly inspired by Geoff Hinton's thoughts. So in particular, he gave this talk in 2007, I think, the first deep learning workshop on what he thought was the way that the brain is working, how temporal kind of temporal code could be used for potentially doing some of the job of backprop. And that led to a lot of the ideas that I've explored in recent years with this. Yeah, so it's kind of an interesting story that has been running for a decade now, basically. One of the topics I've heard you speak about multiple times as well is unsupervised learning. Can you share your perspective on that? Yes, yes. So unsupervised learning is really important. Right now, our industrial systems are based on supervised learning, which essentially requires humans to define what the important concepts are for the problem and to label those concepts in the data. And, you know, we build all these amazing toys and services and systems using this. But humans are able to do much more. They are able to explore and discover new concepts by observation and interaction with the world. A two year old is able to understand intuitive physics. In other words, she understands gravity. She understands pressure. She understands inertia. She understands liquid solids. And of course, her parents never told her about any of this stuff. Right. So how did you figure it out? So that's the kind of question that unsupervised learning is trying to answer. It's not just about we have labels or we don't have labels. It's about actually building a construction, a mental construction that explains how the world works by observation. And more recently, I've been combining the ideas in unsupervised learning with the ideas in reinforcement learning, because I believe that there is a very strong indication about the important underlying concepts that we're trying to disentangle. We're trying to separate from each other that a human or machine can get by interacting with the world, by exploring the world and trying things and trying to control things. So these are, I think, tightly coupled to the original ideas of unsupervised learning. So my take on unsupervised learning, 15 years ago when we started doing the autoencoders and the RBMs and so on, was very focused on the idea of learning good representations. And I still think this is a central question. But the thing we don't know is how and what is a good representation? How do we figure out an objective function, for example? So we've tried many things over the years. And that's actually one of the cool things about unsupervised learning research, that there are so many different ideas, so different ways that this problem can be attacked. And that suggests maybe there's another one we'll discover next year that's completely different. And maybe the brain is using something else completely different. So it's not incremental research. It's something that in itself is very exploratory. We don't have a good definition of what's the right objective function to even measure that a system is doing a good job of unsupervised learning. So, of course, it's challenging. But at the same time, it leaves open a wide field of possibilities, which is what researchers really love. That's something that appeals to me. So today there's so much going on in deep learning. And I think we've passed the point where it's possible for any one human to read every single deep learning paper being published. So I'm curious, what in deep learning today excites you the most? So I'm very ambitious. And I feel like the current state of the science of deep learning is far from where I'd like to see it. And I have the impression that our systems right now make mistakes that suggest the kind of mistakes that suggest that they have a very superficial understanding of the world. So what excites me the most now is sort of direction of research where we're not trying to build systems and do something useful. We're just going back to principles about how can a computer observe the world, interact with the world and discover how that world works. Even if that world is simple, something that we can program as a kind of video game, we don't know how to do that well. And that's cool because I don't have to compete with Google and Facebook and Baidu and so on. Because this is a kind of basic research that can be done by anyone in their garage and could change the world. So there are many, of course, many directions to attack this, but I see a lot of the fruitful interactions between ideas and deep learning and reinforcement learning being really important there. And, you know, I'm really excited that the progress in this direction could have a huge impact on practical applications, actually. Because if you look at some of the big challenges that we have in applications, like how we deal with new domains or categories on which we have too few examples and in cases where humans are very good at solving those problems. So these transfer learning and generalization issues, they would become much easier to tackle if we had systems that had a better understanding of how the world works, a deeper understanding. What is actually going on? What are the causes of what I'm seeing and how could I influence what I'm seeing by my actions? So these are the kinds of questions I'm really excited about these days. And I think they connect also the deep learning research that has evolved over the last couple of decades with even older questions in AI. Because a lot of the success in deep learning has been with perception. So what's left? What's left is sort of higher level cognition, which is about understanding at an abstract level how things work. So our program of understanding high level abstractions, I think, has not reached those high level of abstractions. And so we have to get there. We have to think about reasoning, about sequential processing of information. We have to think of how causality works and how machines can discover all these things by themselves, potentially guided by humans, but as much as possible in an autonomous way. And it sounds like from part of what you said that you're a fan of research approaches where you experiment on, you know, I'm going to use the term toy problem, not in a disparaging way, but on the small problem. And you're optimistic that that transfers to bigger problems later. Yes, yes, yes. I mean, it transfers in a meta way, right? Of course, we're going to have to do some work to scale up and address those problems. But my main motivation for going for those problems is that we can understand better our failures and we can reduce the problem to something we can intuitively sort of manipulate and understand more easily. So sort of a classical divide and conquer, you know, science approach. And also, I think something people don't think about enough is the cycle. The research cycle can be much faster. Right. So if I can do an experiment in a few hours, I can progress much faster. If I have to train a huge model that tries to capture the whole common sense and, you know, everything in the general knowledge, which will eventually will do. It's just that each experiment just takes too much time with current hardware. So while our hardware friends are building machines that can be a thousand or a million times faster, I'm doing those toy experiments. You know, I've also heard you speak about the science of deep learning, not just as an engineering discipline, but doing more work to understand what's really going on. Do you want to share your thoughts on that? Yeah, absolutely. I fear that a lot of the work that we're doing is sort of like blind people trying to find their way. And, you know, you can get a lot of luck and find interesting things that way. But really, if we sort of stop a little bit and try to understand what we're doing in a way that's transferable, because we go down to principles, to theory. But when I say theory, I don't mean necessarily math. I don't I'm not like, of course, I like math and so on. But but I don't think that we need that everything be formalized mathematically, but be formalized logically in the sense that I can convince somebody that, you know, this should work with this makes sense. This is the most important aspect. And then math allows us to make that stronger and tighter. But really, it's more about understanding. And it's about also doing our research not to beat the next baseline or a benchmark or beat the other guys in the other lab or the other company. It's more about, you know, what kind of question should we ask that would allow us to understand better the phenomena of interest? Like, you know, what makes, for example, training in deeper networks harder or recurrent nets harder? We have some ideas, but a lot of things we don't understand yet. So we can maybe design experiments whose goal is not to have like a better algorithm, but just to understand better the algorithms we currently have or why what circumstances make particular algorithm work better and why? I mean, it's the why that really matters. That's what science is about. It's why. Right. Today, there are a lot of people that want to enter the field. And I'm sure you've answered this a lot in one-on-one settings. But, you know, with all the people watching this on video, what advice would you have for people that want to get into AI, get into deep learning? Right. So first of all, there are different motivations and different things you could do. You know, what you need to become a deep learning researcher may not be the same as if you want to be an engineer who's going to use deep learning to build products. There's a different level of understanding that's needed in both cases. But in any case, in both cases, practice, practice. So to really master a subject like deep learning, of course, you have to read a lot. You have to practice programming the things yourself. Very often, I interview students who have used software. And these days, there's so good software around that you can just, you know, plug and play and understand nothing of what you're doing or at such a superficial level that then it becomes hard to figure out when it doesn't work. What's going wrong? So actually trying to implement things yourself, even if it's inefficient, but just to make sure you really understand what is going on is really useful. And, you know, train yourself. So don't just use one of the programming frameworks. You can do everything in a few lines of code, but you don't really know what just happened. Exactly, exactly. And I would say even more than that, try to derive the thing yourself from, you know, first principles if you can. So that really helps. But, you know, I mean, the usual things you have to do, like reading, looking at other people's code, writing your own code, doing a lot of experiment, making sure you understand everything you do. So, I mean, especially for the science part of it, trying to ask, why am I doing this? Why are people doing this? Maybe the answer is somewhere in the book and you have to read more, but it's even better if you can actually figure it out by yourself. Yeah, cool. And in fact, of the things I read, you and Ian Cofello and Aaron Covel wrote a very highly regarded book. Thank you, thank you. Yes, it's selling a lot. It's a bit crazy. I feel like there's more people reading this book than people who can read it right now. But yeah, also proceedings of the ICLR ICLR conference is probably the best concentrated place of good papers. Of course, there are really good papers at NIPS and ICML and other conferences. But if you really want to go for a lot of good papers, just read the last few ICLR proceedings and that will give you a really good view of the field. Any other thoughts when people ask you for advice? How does someone become good at deep learning? Well, it depends on where you come from. Don't be afraid by the math. Just develop the intuitions and then the math becomes really easier to understand once you get the hang of what's going on at an intuitive level. And one good news is that you don't need five years of PhD to become proficient at deep learning. You can actually learn pretty quickly. If you have a good background in computer science and math, you can learn enough to use it and build things and start research experiments in just a few months. Something like six months for people with the right training. Maybe they don't know anything about machine learning, but if they're good in math and computer science, it can be very fast. And of course, that means you need to have the right training in math and computer science. Sometimes what you learn in just computer science courses is not enough. You need some continuous path, especially. So this is probability, algebra and optimization, for example. And calculus. And calculus, yeah. Thanks a lot, Joshua, for sharing all the comments and insights and advice. Even though I've known you for a long time, there are many details of your early history that I didn't know until now. So thank you. Well, thank you, Andrew, for doing this special recording and what you're doing. And well, I hope it's going to be used by a lot of people.