Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.
Welcome back!
Hi ,
We'd like to know you better so we can create more relevant courses. What do you do for work?
Course Syllabus
You've achieved today's streak!
Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
Elevate Your Career with Full Learning Experience
Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
The convolution operation is one of the fundamental building blocks of a convolutional neural network. Using edge detection as the motivating example, in this video, you'll see how the convolution operation works. In previous videos, I've talked about how the early layers of the neural network might detect edges, and then the somewhat later layers might detect parts of objects, and then even later layers maybe detect parts of complete objects, like people's faces in this case. In this video, you'll see how you can detect edges in an image. Let's take an example. Given a picture like that, for a computer to figure out what are the objects in this picture, the first thing you might do is maybe detect vertical edges in this image. So, for example, this image has all those vertical lines where the railings are, as well as, you know, kind of vertical lines at the outline of these pedestrians. And so those get detected in this vertical edge detector output. And you might also want to detect horizontal edges. So, for example, there's a very strong horizontal line where this railing is, and that also gets detected sort of roughly here. So, how do you detect edges in an image like this? Let's look at an example. Here is a 6x6 grayscale image. And because this is a grayscale image, this is just a 6x6x1 matrix rather than 6x6x3, because they're on separate RGB channels. In order to detect edges, or let's say vertical edges in this image, what you can do is construct a 3x3 matrix. And in the parlance, in the terminology of convolutional neural networks, this is going to be called a filter. And we're going to construct a 3x3 filter, or 3x3 matrix that looks like this. 1, 1, 1, 0, 0, 0, minus 1, minus 1, minus 1. Sometimes research papers would call this a kernel instead of a filter, but I'm going to use the filter terminology in these videos. And what you're going to do is take the 6x6 image and convolve it, and the convolution operation is denoted by this asterisk, and convolve it with the 3x3 filter. One slightly unfortunate thing about the notation is that in mathematics, the asterisk is the standard symbol for convolution, but in Python this is also used to denote multiplication, or maybe element-wise multiplication. So this asterisk has dual purposes, it's overloaded notation, but I'll try to be clear in these videos when this asterisk refers to convolution. And the output of this convolution operator will be a 4x4 matrix, which you can interpret, which you can think of as a 4x4 image. And the way you compute this 4x4 output is as follows. To compute the first element, the upper left element of this 4x4 matrix, what you're going to do is take the 3x3 filter and paste it on top of the 3x3 region of your original input image. So I've written here 1, 1, 1, 0, 0, 0, minus 1, minus 1, minus 1. And what you should do is take the element-wise product, so the first one would be 3 times 1, and then the second one would be 1 times 1, I'm going down here, 1 times 1, and then plus 2 times 1, this one, and then add up all of the resulting 9 numbers. So then the middle column gives you 0 times 0, plus 5 times 0, plus 7 times 0, and then the rightmost column gives 1 times minus 1, 8 times minus 1, plus 2 times minus 1. And adding up these 9 numbers will give you negative 5, and so I'm going to fill in negative 5 over here. And you can add up these 9 numbers in any order of course, it's just that I went down the first row, it's just that I went down the first column, then the second column, then the third. Next, to figure out what is this second element, you're going to take the blue square, and shift it one step to the right, like so. And let me get rid of the green marks here. And you're going to do the same element-wise product, and then addition. So you have 0 times 1, plus 5 times 1, plus 7 times 1, plus 1 times 0, plus 8 times 0, plus 2 times 0, plus 2 times negative 1, plus 9 times negative 1, plus 5 times negative 1. And if you add up those 9 numbers, you end up with negative 4. And so on. If you shift this to the right, do the 9 products and add them up, you get 0. And then over here, you should get 8. And just to verify, you have 2 plus 9 plus 5, that's 16. Then the middle column gives you 0. And then the rightmost column, 4 plus 1 plus 3 times negative 1, that's minus 8. So that's 16 on the left column, minus 8. And that gives you 8, like we have over here. Next, in order to get you this element in the next row, what you do is take the blue square, and now shift it 1 down, so you now have it in that position. And again, repeat the element-wise product, and then adding exercise. And if you do that, you should get negative 10 here. And if you shift it 1 to the right, you should get negative 2, and then 2, and then 3, and so on, to then fill in all the rest of the elements of the matrix. And so, to be clear, this minus 16 would be obtained by, you know, from this lower right 3 by 3 region. So a 6 by 6 matrix convolved with a 3 by 3 matrix gives you a 4 by 4 matrix. And these are images and filters. These are really just matrices of various dimensions. But the matrix on the left is convenient to interpret as an image, and the one in the middle we interpret as a filter, and the one on the right, you can interpret that as maybe another image. And this turns out to be a vertical edge detector. And you see why on the next slide. Before going on, though, just one other comment, which is that if you implement this in a programming language, then in practice, most programming languages will have some different function rather than an asterisk to denote convolution. So, for example, in the programming exercise, you use or you implement a function called conv forward. If you do this in TensorFlow, there's a function tf.nn.conv2d, and then other deep learning programming frameworks, in the Keras programming framework, which you see later in this course, there's a function called conv2d that implements convolution, and so on. But all the deep learning frameworks that have good support for computer vision will have some function for implementing this convolution operator. So, why is this doing vertical edge detection? Let's look at another example. So, to illustrate this, we're going to use a simplified image. So, here is a simple 6x6 image where the left half of the image is 10 and the right half is 0. If you plot this as a picture, it might look like this, where the left half, the 10s, give you brighter pixel intensity values, and the right half gives you darker pixel intensity values. I'm using that shade of gray to denote 0s, although maybe it could also be drawn as black. But in this image, there's clearly a very strong vertical edge right down the middle of this image as it transitions from white to black or white to a darker color. So, when you convolve this with the 3x3 filter, and so this 3x3 filter can be visualized as follows, where it is lighter, brighter pixels on the left, and then this mid-tone 0s in the middle and then darker on the right. What you get is this matrix on the right. So, just to verify this math if you want, this 0, for example, is obtained by taking the element-wise products and then multiplying with this 3x3 block, and so you get, from the left column, 10 plus 10 plus 10, and then 0s in the middle, and then minus 10 minus 10 minus 10, which is why you end up with 0 over here. Whereas in contrast, if that 30 would be obtained from this, which you get from having 10 plus 10 plus 10, and then minus 0 minus 0, which is why you end up with a 30 over there. Now, if you plot this rightmost matrix as an image, it will look like that, where there's this lighter region right in the middle, and that corresponds to its having detected this vertical edge down the middle of your 6x6 image. And in case the dimensions here seem a little bit wrong, you know, that the detected edge seems really thick, that's only because we're working with very small images in this example, and if you're using, say, a 1,000x1,000 image rather than a 6x6 image, then you find that this does a pretty good job really detecting the edges, the vertical edges in your image. And in this example, this bright region in the middle is just the output image's way of saying that it looks like there's a strong vertical edge right down the middle of the image. And maybe one intuition to take away from vertical edge detection is that a vertical edge is a 3x3 region, since we're using a 3x3 filter, where there are bright pixels on the left, and you don't care that much what's in the middle, and dark pixels on the right. And the middle of the image is really, in this 6x6 image, is really where there could be bright pixels on the left and dark pixels on the right, and that's why it thinks there's a vertical edge over there. And the convolution operation gives you a convenient way to specify how to find these vertical edges in an image. So, you've now seen how the convolution operator works. In the next video, you'll see how to take this and use it as one of the basic building blocks of a convolutional neural network.