So thanks a lot, Peter, for joining me today. I think a lot of people know you as a well-known machine learning and deep learning and robotics researcher. I'd like to have people hear a bit about your story. How did you end up doing the work that you do? Yeah, it's a good question, and actually if you would have asked me as a 14-year-old what I was aspiring to do, it probably would not have been this. In fact, at the time I thought being a professional basketball player would be the right way to go. I don't think I was able to achieve it. A few of the machine learning left out that the basketball thing didn't work out. Yeah, that didn't work out. It was a lot of fun playing basketball, but it didn't work out to try to make it into a career. So what I really liked in school was physics and math. And so from there, it seemed pretty natural to study engineering, which is applying physics and math in the real world. And actually then, after my undergrad in electrical engineering, I actually wasn't so sure what to do because literally anything engineering seemed interesting to me, like understanding how anything works seems interesting, trying to build anything is interesting. And in some sense, artificial intelligence won out because it seemed like it could somehow help all disciplines in some way, and also it seemed somehow a little more at the core of everything. If you think about how a machine can think, then maybe that's more the core of everything else than picking any specific discipline. I've been saying, you know, AI is the new electricity, sounds like the 14-year-old version of you had an earlier version of that even. In the past few years, you've done a lot of work in deep reinforcement learning. What's happening? Why is deep reinforcement learning suddenly taking off? Before I worked in deep reinforcement learning, I worked a lot in reinforcement learning, actually with you, Andrew, at Stanford, of course. And so we worked on autonomous helicopter flight. Then later at Berkeley with some of my students, we worked on getting a robot to learn to fold laundry. And kind of what characterized the work was a combination of learning that enabled things that would not be possible without learning, but also a lot of domain expertise in combination with the learning to get this to work. And it was very interesting because you needed domain expertise, which is fun to acquire, but at the same time, it was very time-consuming for every new application you wanted to succeed at. You needed domain expertise plus machine learning expertise. And for me, it was in 2012 with the ImageNet breakthrough results from Jeff Hinton's group in Toronto, AlexNet, showing that supervised learning all of a sudden could be done with far less engineering for the domain at hand. And there was very little engineering about vision in AlexNet, made me think we really should revisit reinforcement learning under the same kind of viewpoint and see if we can get the deep version of reinforcement learning to work and do equally interesting things as had just happened in deep supervised learning. And so, you know, it sounds like you saw earlier than most people the potential of deep reinforcement learning. So now looking into the future, what do you see next? What are your predictions for the next several ways to come in deep reinforcement learning? So I think what's interesting about deep reinforcement learning is that in some sense, there is many more questions than in supervised learning. In supervised learning, it's about learning and input-output mapping, but in reinforcement learning, there is the notion of where does the data even come from? So that's the exploration problem. When you have data, how do you do credit assignment? How do you understand what actions you took early on got you the reward later? And then there's issues of safety. When you have a system autonomously collecting data, it's actually rather dangerous in most situations. Imagine a self-driving car company that says, we're just going to run deep reinforcement learning. It's pretty likely that car would get into a lot of accidents before it does anything useful. You need the negative examples to learn from, right? You do need some negative examples somehow, yeah, and positive ones, hopefully. So I think there are still a lot of challenges in deep reinforcement learning in terms of working out some of the specifics of how to get these things to work. So the deep part is the representation, but then the reinforcement learning itself still has a lot of questions. And what I feel is that with the advances in deep learning, somehow one part of the puzzle in reinforcement learning has been largely addressed, which is the representation part. So if there is a pattern, we can probably represent it with a deep network and capture that pattern. And then how to tease apart the pattern is still a big challenge in reinforcement learning. So I think big challenges are how to get systems to reason over long time horizons. So right now, a lot of the successes in deep reinforcement learning are very short horizon. There are problems where if you act well over a five-second horizon, you act well over the entire problem. And so a five-second skill is something very different from a day-long skill or the ability to live a life as a robot or some software agent. So I think there's a lot of challenges there. I think safety has a lot of challenges in terms of how do you learn safely and also how do you keep learning once you're already pretty good. So to give an example, again, that a lot of people would be familiar with, self-driving cars. For a self-driving car to be better than a human driver, human drivers may be getting into accidents, bad accidents, every three million miles or something. And so that takes a long time to see the negative data once you're as good as a human driver. But you want your self-driving car to be better than a human driver. And so at that point, the data collection becomes really, really difficult to get that interesting data that makes your system improve. So there's a lot of challenges related to exploration that tie into that. But one of the things I'm actually most excited about right now is seeing if we can actually take a step back and also learn the reinforcement learning algorithm. So reinforcement learning is very complex, credit is sometimes very complex, exploration is very complex. And so maybe just like how deep learning for supervised learning was able to replace a lot of domain expertise, maybe we can have programs that are learned, that are reinforcement learning programs and that do all this instead of us designing the details. Learn the reward function or learn the whole program? So this would be learning the entire reinforcement learning program. So it would be, imagine you have a reinforcement learning program, whatever it is, and you throw it at some problem, and then you see how long it takes to learn. And then you say, well, that took a while. Now let another program modify this reinforcement learning program. After the modification, see how fast it learns. If it learns more quickly, that was a good modification, and maybe keep it and improve from there. Wow. I see. Right. Yeah. Ambitious direction. Yeah. I think it has a lot to do with maybe the amount of compute that's becoming available. So this would be running reinforcement learning in the inner loop, whereas right now we run reinforcement learning as the final thing. And so the more compute we get, the more it becomes possible to maybe run something like reinforcement learning in the inner loop of a bigger algorithm. Right. That makes sense. So, you know, starting from the 14-year-old you, you've worked in AI for maybe, what, some 20 plus years now. So tell me a bit about how your understanding of AI has evolved over this time. Yeah. So when I started looking at AI, it's very interesting because it really coincided with coming to Stanford to do my master's degree there. And there were some icons there like John McCarthy, who I got to talk with, but who had a very different approach to, in the year 2000, from what most people were doing at the time. But also talking with Daphne Koller. And I think a lot of my initial thinking of AI was shaped by Daphne's thinking, her AI class, her probabilistic graphical models class, and kind of really being intrigued by how simply a distribution over many random variables and then being able to condition on some sets of variables and draw conclusions about others could actually give you so much if you can somehow make it computationally tractable, which was definitely the challenge to make it computable. And then from there, when I started my PhD, Andrew, you arrived at Stanford. And I think you gave me a really good reality check. That's not the right metric to evaluate your work by and to really try to see the connection from what you're working on to what impact it can really have. What change it can make, rather than what's the math that happened to be in your work. Right. That's amazing. I did not realize I had forgotten that. Yeah. One of the things I cite most often, people ask, you know, if you're going to cite only one thing that has stuck with you from Andrew's advice, it's making sure you can see the connection to where it's actually going to do something. You know, you've had and you're continuing to have an amazing career in AI. So for some of the people listening to you on video now, if they want to also enter or pursue a career in AI, what advice do you have for them? I think it's a really good time to get into artificial intelligence. If you look at the demand for people, it's so high, there are so many job opportunities, so many things you can do research-wise, build new companies and so forth. So I would say, yes, it's definitely a smart decision in terms of actually getting going. A lot of it, you can self-study, whether you're in school or not. There is a lot of online courses, there is your machine learning course, there is also, for example, Andrew Karpathy's deep learning course, which has videos online, which is a great way to get started. At Berkeley, there's the deep reinforcement learning course, which has all the lectures online. So those are all good places to get started. I think a big part of what's important is to make sure you try things yourself. So not just read things or watch videos, but try things out with frameworks like TensorFlow, Chainer, Theano, PyTorch, and so forth. Whatever is your favorite, it's very easy to get going and get something up and running very quickly. To get to practice yourself, implementing and seeing what works and seeing what doesn't work. So this past week, there was an article in Mashable about a 16-year-old in the United Kingdom who is one of the leaders on Kaggle competitions, and he just went out and found things online, learned everything himself, and never actually took any formal course per se. And there he is as a 16-year-old just being very competitive in Kaggle competitions. So it's definitely possible. We live in good times for people that want to learn. Absolutely. One question I bet you get asked sometimes is if someone wants to enter AI, machine learning, deep learning, should they apply for a PhD program, or should they get a job with a big company? A lot of it has to do with maybe how much mentoring you can get. So in a PhD program, you're essentially guaranteed, the job of the professor who is your advisor is to look out for you, try to do everything they can to kind of shape you, help you become stronger at whatever you want to do, for example, AI. And so there's a very clear, dedicated person. Sometimes you have two advisors, and that's literally your job, and that's why there are professors that most of what they like about being professors often is helping shape students to become more capable at things. Now, it doesn't mean it's not possible at companies, and many companies have really good mentors and have people who love to help educate people who come in and strengthen them and so forth. It's just it might not be as much of a guarantee and a given compared to actually enrolling in a PhD program where that's the crux of the program is that you're going to learn, and somebody is there to help you learn. Yeah, so it really depends on the company and depends on the PhD program. Absolutely, yeah. But I think it is key that you can learn a lot on your own, but I think you can learn a lot faster if you have somebody who's more experienced, who's actually taking it up as their responsibility to spend time with you and help accelerate your progress. So you've been one of the most visible leaders in deep reinforcement learning. So what are the things that deep reinforcement learning is already working really well at? I think if you look at some deep reinforcement learning successes, it's very, very intriguing. For example, learning to play Atari games from pixels, processing these pixels, which is just numbers that are being processed somehow and turned into joystick actions. Then for example, some of the work we did at Berkeley where we have a simulated robot inventing walking. And the reward that it's given is as simple as the further you go north, the better, and the less hard you impact with the ground, the better. And somehow it decides that walking slash running is the thing to invent, whereas nobody showed it what walking is or running is. Or a robot playing with children's toys and learned to kind of put them together, put a block into a matching opening and so forth. And so I think it's really interesting that in all of these, it's possible to learn from raw sensory inputs all the way to raw controls, for example, torques at the motors. But at the same time, so it's very interesting that you can have a single algorithm. For example, you know, trust region policy, you can learn, can have a robot learn to run, can have a robot learn to stand up, can have, instead of a two-legged robot, now you're swapping a four-legged robot. You run the same reinforcement learning algorithm, and it still learns to run. And so there's no change in the reinforcement learning algorithm. It's very, very general. Same for the Atari games. DQN was the same DQN for every one of the games. But then when it actually starts hitting the frontiers of what's not yet possible is, well, it's nice it learns from scratch for each one of these tasks, but it would be even nicer if it could reuse things it's learned in the past to learn even more quickly for the next task. And that's something that's still at the frontier and not yet possible. It always starts from scratch, essentially. How quickly do you think you see deep reinforcement learning get deployed in the robots around us, or the robots that are getting deployed in the world today? I think in practice, the realistic scenario is one where it starts with supervised learning, behavioral cloning, humans do the work. And I think, actually, a lot of businesses will be built that way, where it's a human behind the scenes doing a lot of the work. Imagine Facebook Messenger Assistant, a system like that could be built with a human behind the curtains doing a lot of the work. Machine learning matches up with what the human does and starts making suggestions to the human. So the human has a small number of options. The human can just click and select. And then over time, as it gets pretty good, you start infusing some reinforcement learning where you give it actual objectives, not just matching the human behind the curtains, but give it objectives of achievement, like maybe how fast were these two people able to plan their meeting, or how fast were they able to book their flight, or things like that. How long did it take? How happy were they with it? But it would probably have to be bootstrapped of a lot of behavioral cloning of humans showing how this could be done. So it starts with behavioral cloning, just supervised learning to mimic whatever the person is doing, and then gradually layer on the reinforcement learning to have it think about longer time horizons. Is that a fair summary? I'd say so. Yeah. Just because straight up reinforcement learning from scratch is really fun to watch. It's super intriguing, and very few things more fun to watch than a reinforcement learning robot starting from nothing and inventing things. But it's just time consuming, and it's not always safe. Thank you very much. That was fascinating. I'm really glad we had the chance to chat. Well, Andrew, thank you for having me. Very much appreciated.

Deep Learning Specialization

Intermediate

Topics

Computer Vision

Deep Learning

NLP

Supervised Learning

Transformers

Collaborator

DeepLearning.AI

Week 2: Neural Networks Basics

Logistic Regression as a Neural Network

Binary Classification
Video
・
8 mins

Logistic Regression
Video
・
5 mins

Logistic Regression Cost Function
Video
・
8 mins

Gradient Descent
Video
・
11 mins

Derivatives
Video
・
7 mins

More Derivative Examples
Video
・
10 mins

Computation Graph
Video
・
3 mins

Derivatives with a Computation Graph
Video
・
14 mins

Logistic Regression Gradient Descent
Video
・
6 mins

Gradient Descent on m Examples
Video
・
8 mins

Derivation of DL/dz (Optional)
Reading
・
10 mins

Python and Vectorization

Vectorization
Video
・
8 mins

More Vectorization Examples
Video
・
6 mins

Vectorizing Logistic Regression
Video
・
7 mins

Vectorizing Logistic Regression's Gradient Output
Video
・
9 mins

Broadcasting in Python
Video
・
11 mins

A Note on Python/Numpy Vectors
Video
・
6 mins

Quick tour of Jupyter/iPython Notebooks
Video
・
3 mins

Explanation of Logistic Regression Cost Function (Optional)
Video
・
7 mins

Lecture Notes (Optional)

Lecture Notes W2
Reading
・
1 min

Quiz

Neural Network Basics

Graded・Quiz

・

50 mins

Programming Assignments

Deep Learning Honor Code
Reading
・
2 mins

Programming Assignment FAQ
Reading
・
10 mins

(Optional) Downloading your Notebook, Downloading your Workspace and Refreshing your Workspace
Reading
・
5 mins

Python Basics with Numpy

Graded・Code Assignment

・

1 hour

Logistic Regression with a Neural Network Mindset

Graded・Code Assignment

・

3 hours

Heroes of Deep Learning (Optional)

Pieter Abbeel Interview
Video
・
16 mins

Week 3: Shallow Neural Networks