Deep Learning Specialization

Welcome to this course on Convolutional Networks. Computer vision is one of the areas that's been advancing rapidly thanks to deep learning. Deep learning computer vision is now helping self-driving cars figure out where are the other cars and the pedestrians around it so as to avoid them. It is making face recognition work much better than ever before, so that perhaps some of you will soon, or perhaps already, be able to unlock a phone, unlock even a door, using just your face. And if you look on your cell phone, I bet you have many apps that show you pictures of food, or pictures of a hotel, or just fun pictures of scenery. And some of the companies that build those apps are using deep learning to help show you the most attractive, the most beautiful, or the most relevant pictures. And I think deep learning is even enabling new types of art to be created. So, I think there are two reasons I'm excited about deep learning for computer vision, and why I think you might be too. First, rapid advances in computer vision are enabling brand new applications to be built that just weren't possible a few years ago. And by learning these tools, perhaps you will be able to invent some of these new products and applications. Second, even if you don't end up building computer vision systems per se, I found that because the computer vision research community has been so creative and so inventive, and coming up with new neural network architectures and new algorithms, it has actually inspired or created a lot of cross-fertilization into other areas as well. For example, when I was working on speech recognition, I sometimes actually took inspiration from ideas from computer vision and borrowed them into the speech literature. So, even if you don't end up working on computer vision, I hope that you find some of the ideas you learn about in this course helpful for some of your algorithms and your architectures. So, with that, let's get started. Here are some examples of computer vision problems we'll study in this course. You've already seen image classification, sometimes also called image recognition, where you might take as input, say, a 64x64 image and try to figure out, is that a cat? Another example of a computer vision problem is object detection. So, if you're building a self-driving car, maybe you don't just need to figure out if there are other cars in this image, but instead you need to figure out the position of the other cars in this picture so that your car can avoid them. So, in object detection, usually we have to not just figure out that these other objects, say, cars in the picture, but also draw boxes around them or have some other way of recognizing where in the picture are these objects. And notice also in this example that there can be multiple cars in the same picture, or at least every one of them within a certain distance of your car. Here's another example, maybe a more fun one, is neural style transfer. Let's say you have a picture and you want this picture repainted in a different style. So, neural style transfer, you have a content image and you have a style image. The image on the right is actually a Picasso. And you can have a neural network put them together to repaint the content image, that is the image on the left, but in the style of the image on the right. And you end up with the image at the bottom. So, algorithms like this are enabling new types of artwork to be created. And in this course, you'll learn how to do this yourself as well. One of the challenges of computer vision problems is that the inputs can get really vague. For example, in previous courses, you've worked with 64x64 images. And so that's 64x64x3 because there are three color channels. And if you multiply that out, that's 12288. So, the input features has dimension 12288. And that's not too bad. But 64x64 is actually a very small image. If you work with larger images, maybe this is a 1000 pixel by 1000 pixel image. And that's actually just one megapixel. But the dimension of the input features will be 1000x1000x3 because you have three RGB channels, and that's 3 million. And if you are viewing this on a smaller screen, this might not be apparent, but this is actually a low-res 64x64 image, and this is a high-res 1000x1000 image. But if you have 3 million input features, then this means that x here will be 3 million dimensional. And so if in the first hidden layer, maybe you have just 1000 hidden units, then the total number of weights, that is the matrix W1, if you use a standard fully connected network like we have in courses 1 or 2, this matrix will be a 1000x3 million dimensional matrix. Because x is now R by 3 million, 3M, I'm using to denote 3 million. And this means that this matrix here will have 3 billion parameters, which is just very, very large. And with that many parameters, it's difficult to get enough data to prevent the neural network from overfitting, and also the computational requirements and memory requirements to train a neural network with 3 billion parameters is just a bit infeasible. But for computer vision applications, you don't want to be stuck using only tiny little images. You want to be able to use large images. To do that, you need to implement the convolution operation, which is one of the fundamental building blocks of convolutional neural networks. Let's see what this means and how you can implement this in the next video. I will illustrate convolutions using the example of edge detection.

Next Lesson

Week 1: Foundations of Convolutional Neural Networks

Convolutional Neural Networks

Computer Vision
Video
・
5 mins

Edge Detection Example
Video
・
11 mins

More Edge Detection
Video
・
7 mins

Padding
Video
・
9 mins

Strided Convolutions
Video
・
8 mins

Convolutions Over Volume
Video
・
10 mins

One Layer of a Convolutional Network
Video
・
16 mins

Clarifications about Upcoming Simple Convolutional Network Example Video
Reading
・
1 min

Simple Convolutional Network Example
Video
・
8 mins

Pooling Layers
Video
・
10 mins

Clarifications about Upcoming CNN Example Video
Reading
・
1 min

CNN Example
Video
・
12 mins

Clarifications about Upcoming Why Convolutions?
Reading
・
1 min

Why Convolutions?
Video
・
9 mins

Lecture Notes (Optional)

Lecture Notes W1
Reading
・
1 min

Quiz

The Basics of ConvNets

Graded・Quiz

・

50 mins

Programming Assignments

(Optional) Downloading your Notebook and Refreshing your Workspace
Reading
・
5 mins

Convolutional Model, Step by Step

Graded・Code Assignment

・

3 hours

Convolution Model Application

Graded・Code Assignment

・

3 hours

Heroes of Deep Learning (Optional)

Yann LeCun Interview
Video
・
27 mins

Week 2: Deep Convolutional Models: Case Studies