Your subscription plan will change at the end of your current billing period. Youโll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial โ cancel anytime
Hi, I'm delighted to be here with my old friend and collaborator, Professor Trist Manning. Trist has a very long and impressive bio, but just briefly, he is Professor of Computer Science at Stanford University, and also the Director of the Stanford AI Lab, and he also has the distinction of being the most highly cited researcher in NLP, or natural language processing. So, really good to be here with you, Trist. Good to get a chance to chat, Andrew. So, we've known each other, collaborated for many years, and one interesting part of your background, I always thought, was that even though today you're a distinguished researcher in machine learning and NLP, you actually started off in a very different area. Your PhD, if I remember correctly, was in linguistics, and you were studying the syntax of language. So, how did you go from studying syntax to being an NLP researcher? So, I can certainly tell you about that, but I should also point out that I'm still actually a professor of linguistics as well. I have a joint appointment at Stanford, and once in a blue moon, not very often, I do actually still teach some real linguistics, as well as computer-involved natural language processing. So, starting out, I was very interested in human languages and how they work, how people understand them, how they are acquired. So, I had this sort of appeal, I saw this appeal in human languages. But that equally led me to think about ideas that we now very much think about as machine learning or computational ideas. So, two of the central ideas in human language are how do little children acquire human language, and for adults, well, we're just talking to each other now, and we pretty much understand each other. And, you know, that's actually an amazing thing, how we manage to do that. So, what kind of processing allows that? And so, that early on got me interested in looking at machine learning. In fact, even before I'd made it to grad school, I'd started, you know, baby steps in learning machine learning, coming off of those interests. Yeah, in fact, all human language is learned. You know, we had learned at some point in our lives to speak English, and we'd grown up in a different place, we would have learned a totally different language. So, it's amazing to think how humans do that, and now maybe machines learn language, too. So, just, you know, tell us more about your journey. So, you had a PhD in linguistics, and then, and then how did you end up? So, there's some stuff before that, as well. So, I mean, you know, when I was an undergrad, well, officially, I actually did three majors. This was in Australia, one in math, one in computer science, and one in linguistics. Now, people get a slightly exaggerated sense of what that means if you're in an American context, because, you know, it'd be, I think, impossible to complete three majors as an undergrad at Stanford. But, you know, actually, where I was as an undergrad, doing, I did an arts degree, so I could do whatever I wanted, like linguistics. You had to complete two majors to complete the arts degree. So, you know, it was sort of more like double majoring, maybe, in US terms. You probably don't know this about me, but at Carnegie Mellon, I actually was a triple major. Last year's was one in statistics and economics. Okay. Yeah, but that's great, we're both fellow triple majors. Yeah, so anyway, I did have background and interest in doing things with computer science, and so my interests were kind of mixed. And I actually, you know, when I applied to grad schools, I mean, one of the places I applied to was Carnegie Mellon, because they were strong in computational linguistics, you know, and if I'd gone there, I would have been enrolled as a CS student. But I ended up at Stanford as a linguistics student, because at that time, there wasn't any natural language processing in the CS department. But you know, I was still interested in pursuing ideas in natural language processing. But at that point, in the early 90s, things were just starting to change. But the bulk of natural language processing was rule-based, logical, declarative systems. But it was also in those years, at the beginning of the 90s, when there first started to be lots of human language material, text and speech available digitally. So this was really actually just before the World Wide Web exploded. But there already started to be things like legal materials and newspaper articles and parliamentary handsars, where you could at last get your hands on millions of words of human language. And it just seemed really clear that there had to be exciting things that you could do by working empirically from lots of human language. And that's what really sort of got me involved in a new kind of natural language processing that then led into my subsequent career. It sounds like your career was initially more linguistics, and with the rise of data and machine learning and empirical methods, it shifted toward NLP and machine learning and NLP. Yeah, I mean, it absolutely certainly shifted, and I've certainly sort of shifted much more to doing both natural language processing and machine learning models. But to some extent, the balance has varied, but I've sort of been with that as of while. Actually, as an undergrad, for my undergrad honors thesis, it was sort of learning the forms of words, which became a famous problem of sort of learning past tense of English verbs in the early connectionist literature. And I was trying to sort of learn paradigms of forms of verbs, and I was learning rules for the different forms using the C4.5 decision tree learning algorithm, if you remember that. Yes. Right. Good times. Yeah. It's certainly non-intuitive, right, how going from present tense to past tense, from run to ran and all the other weird special cases can be. Yeah. Yeah. Cool. Yeah. Hey, so we talked a bunch about NLP, natural language processing. So for some of the learners picking up machine learning for the first time, can you say what is NLP? Sure. Absolutely. Yeah. So NLP stands for natural language processing. Another word or term that's sometimes used for that is computational linguistics, it's the same thing. I mean, natural language processing is actually a weird term, right? So it means that we're doing things with human languages. So you have to have the conception that you're enough of a computer scientist that when you say language, you think in your brain programming language, and therefore you need to say natural language to mean that you're talking about the languages that human beings use. So overall, natural language processing is doing anything intelligent with human languages. So in one sense, that breaks down into understanding human languages, producing human languages, acquiring human languages, though people also often think about it in terms of different applications. And so then you might think about things like machine translation or doing question answering or generating advertising copy or summarization. There are so many different tasks that people work on with particular goals in mind where you do things with human language. And there's a lot of natural language processing because, you know, so much of what the world works on our human world is dealt with and transmitted in terms of human language material. So, you know, because of all of these applications, even web search, most of us use NLP many, many times a day. Yeah, you're right. In some sense, the biggest application of natural language is web search, right? That's really the big one. I mean, traditionally, it was a kind of a simple one, right? That in the good old days, it was, you know, there were various weighting factors and so on, but it was mainly sort of matching keywords, then your search terms, and then some factors about the quality of the page. It didn't really feel like language understanding, but that's really been changing over the years. So these days, you'll often, if you ask a question to a search engine, it will give you, you know, an answer box where it has extracted a piece of text and puts what it thinks is the answer in bold or color or something like that, which is then this task of question answering, and then it's really a natural language understanding task. Yeah, yeah. And I feel like in addition to web searches, maybe the big one, you know, even when we're going to an online shopping website or a movie website and typing in what we want and doing a web search on a much smaller website than, you know, the big search engines, that also increasingly uses sophisticated NLP algorithms, and it's also creating quite a lot of value. So maybe to you, it's not, you know, the real NLP, but it still seems very valuable. I agree it's very valuable, and there are, you know, lots of interesting problems in any e-commerce website with search. Very difficult problems, actually, when people describe the kind of goods they want, and you need to be trying to match it to products that are available. That isn't an easy problem at all, it turns out. Yeah, that's true, yeah. So over the last, I don't know, couple of decades, NLP's gone through a major shift from more of the rule-based techniques that you alluded to just now to using really machine learning much more pervasively. And so you were one of the people at, you know, leading parts of that charge and seeing every step of the way or creating some of the steps as it happened. Can you say a bit about that process and what you saw? Sure, absolutely. Yeah, so when I started off as an undergrad and grad student, really most of natural language processing was done by hand-built systems which variously used rules and inference procedures to sort of try and build up a parse and an understanding of a piece of text. What's an example of a rule or an inference system? So, you know, a rule could be part of the structure of human language, like an English sentence normally consists of a subject noun phrase followed by a verb and an object noun phrase, and that gives you some idea as to how to understand the meaning of the sentence. But it might also be saying something about how to interpret a word, so that a lot of words in English are very ambiguous, but if you have something like the word star and it's in the context of a movie, then it's probably referring to a human being on this astronomical object. And in those days, people tried to deal with things like that using rules of that sort. That doesn't seem very likely to work to us these days, but, you know, once upon a time that was pretty standard. And so it was only when lots of digital text and speech started to become available that it really seemed like there was this different way that instead we could start calculating statistics over human language material and building machine learning models. And so that was the first thing that I got into in the sort of mid to late 1990s. And so, you know, the first area where I started doing lots of research and publishing papers and getting well known is building what in the early days we often called statistical natural language processing. But it later merged into, in general, probabilistic approaches to artificial intelligence and machine learning. And that sort of took us through to approximately 2010, let's say. And that's roughly when the new interest in deep learning using large artificial neural networks started to take off. For my interest in that, I really have you to thank, Andrew, because at this stage, Andrew is still full time at Stanford and he was in the office next door to me and he was really excited about the new things that were happening in the area of deep learning. I guess anyone who walked into his office, he'd tell them, oh, it's so exciting what's happening now in neural networks, you better start looking at that. And so, you know, that was really the impetus that got me pretty early on involved in looking at things in neural networks. I had actually seen a bit of it before. So while I was a grad student here, actually, Dave Rumelhart was at Stanford in psych and I'd taken his neural networks class. And so, you know, I'd seen some of that, but it hadn't actually really been what I'd gotten into for my own research. So I didn't know that. Thank you. Yeah. And then we wound up supervising some students together. Yeah, absolutely. But I'd love to hear the rise of also deep learning and NLP. What did you see? Yeah. So starting about 2010, yeah, me, students started to do the first papers in deep learning aimed at NLP conferences. You know, it's always hard when you're trying to do something new. We had exactly the same experiences that people 15 or so years earlier had had when they started trying to do statistical NLP of when there's an established way of doing things. It's really hard to push out new ideas. So really, some of our first papers were rejected from conferences and instead appeared at machine learning conferences or deep learning workshops. But very quickly that started to change and people got super interested in neural network ideas. But I sort of feel like the neural network period, which started effectively about 2010, it self divides in two because for the first period, let's basically say it's till 2018. We showed a lot of success at building neural networks for all sorts of tasks. We built them for syntactic parsing and sentiment analysis and what else to question answering. But it was sort of like we were doing the same thing that we used to do with other kinds of machine learning models, except we now had a better machine learning model. And we were sort of instead of training up a logistic regression or a support vector machine, we were still doing the same kind of sentiment analysis task. But now we're doing it with a neural network. So I think looking back now, in some sense, the bigger change came around 2018 because that was when the idea of, well, we could just start with a large amount of human language material and build large self-supervised models. So that was models then like BERT and GPT and successor models for that. And they could just sort of acquire from word prediction over a huge amount of text, this amazing knowledge of human languages. And I think really probably that's going to be viewed in retrospect as the bigger kind of cut point where the way things were done really changed. Yeah, I think there is that trend for the large language models, learning from massive amounts of data. I think even in the lead up to that, there was one of your research papers that really, you know, slightly blew my mind, which is a glove paper. So because with word embeddings, where you learn a vector of numbers to represent a word using a neural network, that was quite mind-blowing for me. And then the glove work that you did really cleaned up the math, made it so much simpler. And then I remember reading that and said, oh, that's all there is to it. And then you can learn these really surprisingly detailed representations of the computer, learn nuances of what words mean. Absolutely. Yeah. So I should give a little bit of credit to others. Other people also worked on some similar ideas, including Renan Colbert and Jace Weston and Thomas Mikolov and colleagues at Google. But the glove word vectors is one of the very prominent systems of word vectors. So these word vectors already did, yeah, you're right, illustrate this idea of using self-supervised learning that we just took massive amounts of text and then we could build these models that knew an enormous amount about the meaning of words. I mean, it's still something I sort of show people every year in the first, first lecture of my NLP class, because, you know, it's something simple, but it actually just, you know, works so surprisingly well. You can do this sort of simple modeling of trying to predict a word given the words in the context and simply by sort of running the math of learning to do those predictions well, you learn all these things about word meaning and you can do these really nice patterns of, you know, similar word meaning or analogies of something like, you know, pencil is to drawing as paintbrush is to, and I'll say painting, right? That it's sort of already showing just a lot of successful learning. So that, that was the precursor to what then got developed to the next stage with things like BERT and GPT, where it wasn't just meanings of individual words, but meanings of whole pieces of text and context. Yeah, so I found it amazing that you can take a small neural network or some model and then give it lots of English sentences or some other language and hide a word, ask it to predict, what is the word that I just hit? And that allows it to learn these analogies and these very deep, what you think are really deep things behind the meaning of the words. And then, you know, 2018, maybe this other intersection point, what happened after that? Yeah, so, I mean, in 2018, that was the point in which, well, sort of really two things happened. One thing is that people, well, really in 2017, had developed this new neural architecture, which was much more scalable onto modern parallel GPUs. And so that was the transformer architecture. The second part of it, though, was, you know, maybe people rediscovered, because I was using the same trick as the glove model, that if you have the task of just predicting a word, given a context, either a context on both sides of it or the preceding words, that that just turns out to be an amazing learning task. And that surprises a lot of people. And a lot of the time you see discussions where people say disparaging things of, you know, this is nothing interesting is happening. All it's doing is statistics to predict which word is most likely to come after the preceding words. And I think the really interesting thing is that that's true, but it's not true. I mean, because, yes, what the task is, is you're predicting the next word given preceding words. But the really interesting thing is, if you want to do that task really as well as possible, then it actually helps to understand the whole of the rest of the sentence and know who's doing what to who in what's in the sentence. But more than that, it also helps to understand the world, because if your text is going something along the lines of, you know, the currency used in Fiji is that, well, you need to have some world knowledge to know what the right answer to that is. And so good models are doing this, learn both to follow the structure of sentences and their meaning and to know facts about the world also that they can predict. And therefore, this turns into what's sometimes referred to as an AI complete task, right? That you really need, there's nothing that can't actually be useful in answering this what word comes next sense, right? You know, you can be in the World Cup semifinals, the teams are, and you need to know something about soccer to be giving the right answer. AI complete is this funny concept, right? It's this idea that you can solve this one problem, you can solve, you know, everything in AI, right? Kind of make an analogy to NP complete problems from the theory of computing. What do you think? Do you think predicting the next word is AI complete? I have very mixed feelings about that myself. Actually, I'll say I don't think it's true. So, Jules, what do you think? I think it's not quite true because I think there are other kinds of things that human beings manage to work out. You know, there are human beings that have clever insights in mathematics or there are human beings who are looking at something that's a much more, you know, three dimensional real world puzzle of sort of figuring out how to do something mechanical or something like that. And that's just not a language problem. But on the other hand, I mean, I think language gets closer to universality than some people think as well, because, you know, we live in this 3D world and operate in it with our bodies and our feelings and other creatures and artifacts around it. And you could think, well, not much of that is in language at all. But actually, just about all of this stuff we think about, we talk about, we write about in language, we can describe the positions of things relative to each other in language. So a surprising amount of the other parts of the world are seen in reflection and language. And therefore, you're learning about all of them, too, when you learn about language use. Yeah, you learn about, you know, one aspect of a lot of things, even if things like how do you ride a bicycle can't really be perfectly described. You don't really learn how to ride a bicycle, but you learn some aspects of what it involves, that you need to balance and you have to have your feet on the pedals and push them and all of that kind of things. Yeah. And so with this trend in NLP, the large language models has been very exciting for the last several years. What are your thoughts on where all this will go? Well, I mean, yeah, so it's just been amazingly successful and exciting, right? That so we haven't really explained all the details, right? So there's a first stage of learning these large language models where the task is just to predict the next word. And you do that billions of times over a very large piece of text. And behold, you get this large neural network, which is just a really useful artifact for all sorts of natural language processing tasks. But then you still actually have to do something with that if you want to do a particular task, whether that's question answering or summarization or detecting toxic content in social media or something like that. And at that point, there's a choice of things that you could do with it. The traditional answer was then you had a particular task, like say it's detecting toxic comments in social media, and you'd take some supervised data for that. And then you'd fine tune the language model to answer that classification task. But you were enormously helped by having this base of this large self-supervised model because it meant that the model had enormous knowledge of language and it could generalize very quickly. So unlike the sort of the standard old days of self-supervised learning where it was kind of, well, if you give me 10,000 labeled examples, I might be able to produce a halfway decent model for you. But if you give me 50,000 labeled examples, it'll be a lot better. It's sort of turned it into this world of, well, if you give me 100 labeled examples and I'm fine tuning a large language model, I'll be able to do great, better than I would have been able to do with the 50,000 examples in the old world. Some of the more recent exciting works now even going beyond that, it's now, well, maybe you don't actually have to fine tune the model at all. So people have done a lot of work using methods sometimes referred to as prompting or instruction, where you can simply in natural language, perhaps with examples, perhaps with explicit instructions, just tell the model what you want it to do. And it does it, which, you know, even as someone who's been working in natural language processing for 30 years, I mean, it actually just blows my mind how well this works. And, you know, I guess I wasn't a decade ago thinking that in now we'd be able to just, you know, tell the model, I want you to summarize this piece of text here and it will then summarize it. I think that's incredible. Yeah. So we're in this very exciting time where a lot of new natural language capabilities are unfolding. I think there's just no doubt at all for the next couple of years, the future of that is extremely bright as people work out different things and different ways to do things and people start to apply in different application areas the kind of capabilities that have been unlocked with recent technological developments. You know, there's always a question in technology as to sort of whether the curve keeps on heading steeply upwards or whether there's then some new things we have to discover how to do. It's been going up for quite a while. So hopefully, you know, extrapolation is always dangerous, but we'll see. I'm just curious, you know, you mentioned writing prompts, tell the NLP system, the large language model, what you want and it seems to magically do it. I'm curious, do you think prompt engineering is the path of the future? Actually, when I write these prompts, I sometimes find it works miraculously and sometimes it's frustrating, you know, the process of rewording my instructions to tweak the wording to get it just right to generate the result I want. So do you think prompt engineering is the wave of the future or do you think it's an intermediate hack until someone invents a better way to control the outputs of these systems? I think it's both. I think it will be the way of the future, but I also think at the moment people are doing, yeah, a lot of hacking around and rewording to try and get things to work better and with any luck, with a few more years of development, that'll start to go away. I mean, one way to think about the difference is sort of in comparison to the kind of voice assistance or virtual assistants that are available on phones and speaker devices like Amazon Alexa these days, right? I mean, I think all of us have had the experience that present those devices aren't always great. But if you know the right way to word things, it'll do something. But if you use the wrong wording, it won't. You know, and the difference with human beings is by and large, you don't have to think about that. You can say what you want. It doesn't matter what words you choose. The other human being, you know, assuming it's someone who knows the same language, et cetera, will understand you and do what you want. And I think and would hope that we'll start to see the same kind of progression with these models that at the moment, you know, fiddling around with the particular wording you use can make a very big difference to how well it works. But hopefully in a few years time, that just won't be true. You'll be able to use different wordings and it'll still work. But the basic idea that we're moving into this age where actually human language will be able to be used as an instruction language to tell your computer what to do. So instead of having to use menus and, you know, radio buttons and things like that, or writing Python code, instead of either of those things that you'll be able to say what you want and the computer will do it, I think that age is opening up in front of us that will continue to build and that will be hugely transformative. It feels like we've come a long ways, but we have much more to come and much more to go. Yeah, absolutely. In the development of NLP technology, there's one thing I want to ask you, and I suspect you and I may have different perspectives on this, but in the last couple of decades, the trend has been to rely less on rule-based engineering and more on machine learning on data, sometimes loss of data. Look into the future. Where do you think that mix of hand-coded constraints or other constraints, explicit constraints versus, you know, let's get a neural network and throw lots of data at it. Where do you think that balance will fall? I think that there's no doubt that using learning from data is the way forward and what we're going to continue to do. But I think there's still a space for models that have more structure, more inductive bias, that have some kind of basis of exploiting the nature of language. So in recent years, the model that's been enormously successful is the transformer neural network. And the transformer neural network is essentially this huge association machine. So it'll just suck associations from anywhere. And look at two words and figure out which word relates to which other word for all words. Yeah, so you use everything to predict anything and do it over and over again and you'll get anything you want. And, you know, that's been incredibly, incredibly successful, but it's been incredibly successful in the domain where you have humongous, humongous amounts of data, right, so that these transformer models for these large language models are now being trained on tens of billions of words of text. You know, when I started off in statistical natural language processing and some of the traditional linguists used to complain about the fact that I was, you know, collecting statistics from 30 million words of newswire and building a predictive model and thought, you know, that was just not what linguistics was about. You know, I felt I had a perfectly good answer, which is that, you know, a human kid as they're learning language, they're exposed to actually well more than 30 million words of data. But, you know, that kind of amount of data. So, you know, the kind of amounts of data we are using were perfectly reasonable amounts of data to be using to be, you know, not exactly trying to model human language acquisition, but to be thinking about how we can learn about language from lots of data. But, you know, these modern transformers are now, you know, using already at least two orders of magnitude more data. And, you know, most most people think the way to get things to the next level is to use more still and make it three orders of magnitude. And, you know, in one sense, that scaling up strategy has been hugely effective. So, you know, I don't blame anybody for saying let's make another order of magnitude bigger and see what amazing things we can do. But it also shows that human learning is just way, way better in being able to extract a lot more information out of a quite limited amount of data. And at that point, you can have various hypotheses, but I think it's reasonable to assume that human learning is somewhat structured towards the structure of the world and things that sees in the world. And that allows it to learn more quickly from less data. All right. I'm with you on that. I think better learning algorithms, current machine learning algorithms are much less efficient or makes much less efficient use of data. And so there's way more data than any, you know, child. And I think whether the improved learning algorithms will be from linguistic like rules or whether it'll just be engineers engineering much more efficient versions of the transformer or whatever comes after it. That that that will be used. I doubt it will be traditional. I don't I don't think it'll be by people explicitly putting traditional linguistic rules into the system. I don't think that's the way forward. On the other hand, I mean, you know, I think what we're starting to see is models like these transformer models are actually discovering the structure of language themselves. Right. So, you know, the broad facts of, you know, human language that, you know, English has the subject before the verb and the object afterwards, whereas, you know, in Japanese that the verbs at the end of the sentence and the subject and object are normally in that order before it could be in the other order. You know, actually transformer models are learning these facts. You can interrogate them and see that even though they were never explicitly told about subjects and objects, that they know these notions. So I think they, you know, they're discovering a lot else as well about language use and context and the meanings and senses of words and what is and isn't unpleasant language. But part of what they're learning is the same kind of structure that linguists have laid out as the sort of structure of different human languages. So it's as if over many decades linguists have discovered certain things. And by training on billions of words, transformers are discovering the same things that linguists discover in human languages. That's cool. So all this is really exciting progress in NLP, driven by machine learning and by other things. To someone entering the field, entering machine learning or AI or NLP, there's just a lot going on. What advice would you have for someone wanting to break into machine learning? Yeah, well, it's a great time to break in. I think there's just no doubt at all that we're still in the early stages of seeing the impact of this new approach where effectively software computer science is being reinvented in on the basis of much more use of machine learning and the various other things that come away from that. And then more generally across industries, there are just lots of opportunities for more automation, making more use of, you know, interpretation of human language material for me or in other areas like vision and robotics or the same kinds of things. So there are lots of possibilities. So, you know, at that point, there's lots to do, obviously. You want to get some kind of good foundation, right? So knowing some of the core technical methods of machine learning, understanding ideas of how to build models from data, look at losses, do training, diagnose errors, all of these core things. I mean, that's definitely useful for natural language processing in particular. Some of those skills are completely relevant. But then there are particular kinds of models that are commonly used, including the transformer that we've talked about a lot today. You definitely should know about transformers. And indeed, they're increasingly being used in every other part of machine learning as well for vision, bioinformatics. Even robotics is now using transformers. But beyond that, I think it's also useful to learn something about human language and the nature of the problems that involves, because, I mean, even though people aren't directly going to be encoding rules of human language into their computing system, a sensitivity to sort of what kind of things happen in language and what to look out for and what you might want to model, that's still a useful skill to have. And then in terms of learning the foundations, learning about these concepts, you had entered AI from a linguistic background. And we now see people from all walks of life wanting to start doing work in AI. What are your thoughts on the preparation one should have or any thoughts on how to start from something other than computer science or AI? So there are lots of places you can come from and vector across in different ways. And we've seen tons of people doing that, that there are people who started off in different areas, whether it was chemistry, physics or even much further in field. And people, you know, history, whatever, have started to look at machine learning. I mean, I think there are sort of two levels of answer there. I mean, one level of answer is, you know, one of the amazing transformations is that there's now these very good software packages for doing things with neural network models. I mean, this software is really easy to use. You don't actually need to understand a lot of highly technical stuff. You need to have some kind of high level conception about what is the idea of machine learning and how do I go about training a model and what should I look at and the numbers that are being printed out to see if it's working right, that, you know, you don't actually have to have a higher degree to be able to build these models. I mean, and indeed what we're seeing is, you know, lots of high school students are getting into doing this because it's actually something that if you have some basic computer skills and a bit of programming you can pick up and do, it's just way more accessible than lots of stuff that preceded it, whether in AI or outside of AI and other areas, you know, like operating systems or security. But, you know, if you want to get to a deeper level than that and actually want to understand more of what's going on, I think you can't really get there if you don't have a certain mathematics foundation, like at the end of the day that deep learning is based on calculus and you need to be optimizing functions. And if if you sort of don't have any background in that, I think that sort of ends up as a wall at some point. So, you know, the math and machine learning and data science, it does come in handy for some of the work we are now having. Yeah. So if so, I think at some level, if you're at the major in history or, you know, non-mathematical parts of psychology, I actually have a good friend who, yeah, he, you know, learned calculus in grad school because he was a psychologist and he'd never done it before and decided that he wanted to start learning about these new kinds of models and decided it wasn't too late to be able to go and take a calc course. And so he did. Right. So, you know, you do need to know some of that stuff. But for lots of people, if they've seen some of that before, even if you're kind of rusty, I think you can kind of get back in the zone. And it doesn't really matter that you haven't, you know, done AI as an undergrad or machine learning and things like that, that you can really start to learn how to build these models and do things. And, you know, really, that's my own story, right? That despite the fact that they let me sit in the School of Engineering at Stanford these days, you know, my background isn't as an engineer, you know, my PhD is in linguistics, that, you know, I've sort of largely vectored across from having some knowledge of mathematics and linguistics and knowing some programming into sort of getting much more into building AI models. Which proves something. Do you think the improved libraries and abstractions that are now available, like coding frameworks like TensorFlow or PyTorch, do you think that reduces the need to understand calculus? Because, boy, it's been a while since I had to actually take a derivative in order to even implement or create a new neural network architecture because of automatic differentiation. Yeah. Yeah, I mean, absolutely. I mean, so in the early days when we were doing things sort of 2010 to 2015, right, for every model we built, we were working out the derivatives by hand and then, you know, writing some code and whatever it was, you know. Sometimes it was Python, but sometimes it might have been Java or C to calculate these derivatives and checking that we got them right and so on. Well, you know, these days you actually don't need to know any of that to build deep learning models. I mean, this is actually something I think about, been thinking about even with respect to my own natural language processing with deep learning class that I teach, you know. At the beginning, we do still go through doing, you know, matrix calculus and making sure people know about Jacobians and things like that so that they understand what's being done in backpropagation deep learning. But, you know, there's sort of this sense in which that means that we just give them hell for two weeks, you know, sort of like boot camp or something to make them suffer. And then we say, oh, but you do the rest of the class with PyTorch and they sort of never have to know any of that again. Right. You know, there's always a question of how deep you want to go in technical foundations, right? You can keep on going, right? Like, does a computer scientist in the 2020s need to understand, you know, electronics and transistors or what happens in, you know, CPU? Well, you know, it's complicated. I mean, in various ways, it is helpful to know some of that stuff. I mean, you know, I know, Andrew, you were one of the pioneers in getting machine learning onto GPUs. And, well, you know, that sort of means you had to have some sense that there's this new hardware out there and it has some attributes of parallelism that means there's likely to be able to do something exciting. So, you know, it is useful to have some broader knowledge and understanding. And, you know, sometimes something breaks. And if you have some deeper knowledge, you can understand why it broke. But there's another sense in which, you know, most people have to take some things on trust and you can do most of what you want to do in neural network modeling these days without knowing calculus at all. Yeah, I think that's a great point. I feel like sometimes the reliability of the abstraction determines how often you need to go in to fix something that's broken. So I'm actually my understanding of quantum physics is very weak. I barely understand it. So you could argue I don't understand how computers work because transistors are built in quantum physics. But fortunately, you know, if something went wrong with the transistor, I've never had to go to the transistor. They're a bit hard to fix, I think. And so I think another example, you know, the sort function. There are libraries to sort things and sometimes they actually don't work, right? Swapping the memory or whatever. And that's when if you really understand how the sort function works, you can go in and fix it. But then sometimes if we have abstractions, libraries, APIs are reliable enough that it's nice that those abstractions then diminishes the need to understand some of the things that happen. So this is an exciting world. Feels like, you know, we have giants building on the shoulders of giants. And all of these things are becoming more complex and more exciting every month. Yeah, absolutely. So thanks, Chris. That was really interesting and inspiring. And I hope that to everyone watching this, hearing Chris's own journey to become a computer scientist and to become a leading, maybe the leading NLP computer scientist, as well as all of this exciting work happening in NLP right now. I hope that inspires you to jump into the steam and take a go at it. There's just a lot more work to be done collectively by our community than still. So I think the more of us are working on this, the better off the world will be. So thanks a lot, Chris. It was really great having you here. Thanks a lot, Andrew. It's been fun chatting.