In the last video, you saw how to define the content cost function for neural style transfer. Next, let's take a look at the style cost function. So, what does the style of an image mean? Let's say you have an input image like this. You're used to seeing a content like that compute features at various different hidden layers. And let's say you've chosen some layer L, maybe that layer, to define the measure of the style of an image. What we're going to do is define the style as the correlation between activations across different channels in this layer L activation. So, here's what I mean by that. Let's say you take that layer L activation. So, this is going to be an NH by NW by NC block of activations. And we're going to ask, how correlated are the activations across different channels? So, to explain what I mean by this maybe slightly cryptic phrase, let's take this block of activations and let me shade the different channels by different colors. So, in this little example, we have, say, five channels, which is why I have five shades of color here. In practice, of course, in neural network, we usually have a lot more channels than five, but using just five makes the drawing easier. But to capture the style of an image, what you're going to do is the following. Let's look at the first two channels. Let's look at the red channel and the yellow channel and say, how correlated are activations in these first two channels? So, for example, in the lower right-hand corner, you have some activation in the first channel and some activation in the second channel. So, that gives you a pair of numbers. And what you do is look at different positions across this block of activations and just look at those two pairs of numbers, one in the first channel, the red channel, one in the yellow channel, the second channel, and you just look at these two pairs of numbers and see when you look across all of these positions, all of these NH by NW positions, how correlated are these two numbers? So, why does this capture style? Let's look at an example. Here's one of the visualizations from the earlier video. This comes from, again, the paper by Matthew Zeiler and Rob Fergus that I referenced earlier. And let's say for the sake of argument that the red channel corresponds to this neuron. So, it's trying to figure out if there's this little vertical texture in a particular position in the image. And let's say that this second channel, this yellow second channel, corresponds to this neuron, which is vaguely looking for orange-colored patches. So, what does it mean for these two channels to be highly correlated? Well, if they're highly correlated, what that means is whenever part of the image has this type of subtle vertical texture, that part of the image will probably have this orange-ish tint. And what does it mean for them to be uncorrelated? Well, it means that whenever there is this vertical texture, it probably won't have that orange-ish tint. And so, the correlation tells you which of these high-level texture components tend to occur or not occur together in part of an image. And it's the degree of correlation that gives you one way of measuring how often these different high-level features, such as vertical texture or this orange tint or other things as well, how often they occur and how often they occur together and don't occur together in different parts of an image. And so, if we use the degree of correlation between channels as a measure of the style, then what you can do is measure the degree to which, in your generated image, this first channel is correlated or uncorrelated with the second channel. And that will tell you, in the generated image, how often this type of vertical texture occurs or doesn't occur with this orange-ish tint. And this gives you a measure of how similar is the style of the generated image to the style of the input style image. So, let's now formalize this intuition. So, what you're going to do is, given an image, compute something called a style matrix, which will measure all those correlations we talked about on the last slide. So, more formally, let's let A superscript L subscript ijk denote the activation at position ijk in hidden layer L. So, i indexes into the height, j indexes into the width, and k indexes across the different channels. So, in the previous slide, we had five channels, but k will index across those five channels. So, what the style matrix will do is, you're going to compute a matrix called this G superscript square bracket L. This is going to be an nc by nc dimensional matrix, so it will be a square matrix. Remember, you have nc channels, and so you have an nc by nc dimensional matrix in order to measure how correlated each pair of them is. So, in particular, G L k k prime will measure how correlated are the activations in channel k compared to the activations in channel k prime. Where here, k and k prime will range from 1 through nc, the number of channels there are in that layer. So, more formally, the way you compute G L, and I'm just going to write down the formula for computing one element. So, the k k prime element of this, this is going to be sum over i, sum over j of the activation in that layer i j k times the activation at i j k prime. So, here, remember i and j index across the different positions in the block, indexes over the height and width. So, i is the sum from 1 to nh, and j is the sum from 1 to nw, and k here and k prime index over the channels. So, k and k prime range from 1 to the total number of channels in that layer of the neural network. So, all this is doing is summing over the different positions of the image, over the height and width, and just multiplying the activations together of the channels k and k prime, and that's the definition of G k k prime. And you do this for every value of k and k prime to compute this matrix G, also called the style matrix. And so, notice that if both of these activations tend to be large together, then G k k prime will be large. Whereas, if they are uncorrelated, then G k k prime might be small. And technically, I've been using the term correlation to convey intuition, but this is actually the unnormalized cross covariance, because we're not subtracting out the mean, and this is just multiplying out these elements directly. So, this is how you compute the style of an image. And you'd actually do this for both the style image S and for the generated image G. So, just to distinguish that this is the style image, you know, maybe let me add a round bracket S there, just to denote that this is the style image for the image S, and those are the activations on the image S. And what you do is then compute the same thing for the generated image. So, it's really the same thing. Sum of i, sum of j, a i j k l, a i j k l, and the summation indices are the same as follows. And if you want to just denote this is for the generated image, I'll just put the round brackets G there. So, now you have two matrices that capture what is the style of the image S, and what is the style of the image G. And by the way, we've been using the alphabet capital G to denote these matrices. In linear algebra, these are also called the DRAM matrix, or these are called DRAM matrices. But in this video, I'm just going to use the term style matrix, but it's this term DRAM matrix that motivates using capital G to denote these matrices. Finally, the cost function, the style cost function, if you're doing this on layer L, between S and G, you can now define that to be just the, just the, you can now define that to be just the difference between these two matrices. G L, G squared. And these are matrices, so I'll just take the appropriateness out. So, this is just the sum of squares of the element wise differences between these two matrices. And just to write this out, this is going to be sum over K, sum over K prime of these differences. S K K prime minus G L G K K prime, and then the sum of squares of elements. The authors actually used this for normalization constants, 2 times an H and W of that layer and C of that layer, and then square this, you can put this up here as well. But the normalization constant doesn't matter that much because this cost is multiplied by some hyperparameter B anyway. So, just to finish up, this is the style cost function defined using layer L, and as you saw on the previous slide, this is basically the Frobenius norm between the two style matrices computed on the image S, and on the image G, Frobenius norm squared, and then with an additional normalization constant, which isn't that important. And finally, it turns out that you get more visually pleasing results if you use the style cost function from multiple different layers. So, the overall style cost function, you can define as sum over all the different layers of the style cost function for that layer, which we've defined above, weighted by some set of parameters, by some set of additional hyperparameters, which we'll denote as lambda L here. So, what this does is it allows you to use different layers in the neural network, both the early ones, which measure relatively simpler low-level features like edges, as well as some later layers, which measure high-level features and cause the neural network to take both low-level and high-level correlations into account when computing style. And in the program exercise, you gain more intuition about what might be reasonable choices for this hyperparameter lambda as well. And so, just to wrap this up, you can now define the overall cost function as alpha times the content cost, which is C and G, plus beta times the style cost, which is S and G, and then use gradient descent, or a more sophisticated optimization algorithm if you want, in order to try to find an image G that tries to minimize this cost function J of G. And if you do that, you can generate pretty good looking neural artistic. And if you do that, you'll be able to generate some pretty nice novel artwork. So, that's it for neural style transfer. I hope you had fun implementing it in this week's program exercise. Before wrapping up this week, there's just one last thing I want to share with you, which is how to do convolutions over 1D or 3D data rather than over only 2D images. Let's go on to the last video.

Deep Learning Specialization

Intermediate

Topics

Computer Vision

Deep Learning

NLP

Supervised Learning

Transformers

Collaborator

DeepLearning.AI

Week 4: Special Applications: Face recognition & Neural Style Transfer

Face Recognition

What is Face Recognition?
Video
・
4 mins

One Shot Learning
Video
・
4 mins

Siamese Network
Video
・
4 mins

Triplet Loss
Video
・
15 mins

Clarifications about Upcoming Face Verification and Binary Classification Video
Reading
・
1 min

Face Verification and Binary Classification
Video
・
6 mins

Neural Style Transfer

What is Neural Style Transfer?
Video
・
2 mins

What are deep ConvNets learning?
Video
・
7 mins

Cost Function
Video
・
3 mins

Content Cost Function
Video
・
3 mins

Clarifications about Upcoming Style Cost Function Video
Reading
・
1 min

Style Cost Function
Video
・
13 mins

1D and 3D Generalizations
Video
・
9 mins

Lecture Notes (Optional)

Lecture Notes W4
Reading
・
1 min

Quiz

Special Applications: Face Recognition & Neural Style Transfer

Graded・Quiz

・

50 mins

End of access to Lab Notebooks

[IMPORTANT] Reminder about end of access to Lab Notebooks
Reading
・
2 mins

Programming Assignments

Face Recognition

Graded・Code Assignment

・

3 hours

Art Generation with Neural Style Transfer

Graded・Code Assignment

・

3 hours

References & Acknowledgments

References
Reading
・
10 mins

Acknowledgments
Reading
・
10 mins

Next in the Professional Certificate

Sequence Models