Generalized Backpropagation for Neural Networks (e.g. DeepDream)

Sebastian mentioned in his answer that deepdream can be possible using NetDerivative. Here are my attempts following his outlines.

Instead of using the inception model, I'm using VGG-16 here for simplicity. Inception model allows arbitrary image size, but may need some extra normalization steps.

The MXNet version of the VGG model can be downloaded from MXnet's github page https://github.com/dmlc/mxnet-model-gallery

The VGG model can be loaded as

Needs["NeuralNetworks`"]
VGG = ImportMXNetModel["vgg16-symbol.json", "vgg16-0000.params"]

enter image description here

The VGG is trained by 224 by 224 images, so we will load and crop our test image to this size

img = RemoveAlphaChannel@
  ImageCrop[
   Import["http://hplussummit.com/images/wolfram.jpg"], {224, 224}, 
   Padding -> None]

enter image description here

We can then take the network up to the layer at which we want maximum activation, and then attach a SummationLayer

net = NetChain[{Take[VGG, {"conv1_1", "pool1"}], SummationLayer[]}, 
  "Input" -> NetEncoder[{"Image", {224, 224}}]]

We then want to back propagate to the input, and find the gradient this respect to the value at the summation layer. This can be done with NetDerivative. This is what the gradient looks like

NetDecoder["Image"][
 First[(NetDerivative[net]@<|"Input" -> img|>)[
   NetPort["Inputs", "Input"]]]]

enter image description here

We can now add the gradient to the input image, and calculate the gradient with respect this new image, and so and so forth. This process can also be understood as gradient ascent. Here is the function that does one step of gradient ascent

applyGradient[net_, img_, stepsize_] := 
 Module[{imgt, gdimg, gddata, max, dim},
  gdimg = 
   NetDecoder["Image"][
    First[(NetDerivative[net]@<|"Input" -> img|>)[
      NetPort["Inputs", "Input"]]]];
  gddata = ImageData[gdimg];
  max = Max@Abs@gddata;
  Image[ImageData[img] + stepsize*gddata/max]
  ]

We can then apply this repeatedly to our input image and get a deepdream like image

Nest[applyGradient[net, #, 0.1] &, img, 10]

enter image description here

Here are the dreamed image at different pooling layers. When we dream at early pooling layer, localized simple features show up. And as we get to later layers of the network, more complexity features emerges.

dream[img_, layer_, stepsize_, steps_] := Module[{net},
  net = NetChain[{Take[VGG, {"conv1_1", layer}], SummationLayer[]}, 
    "Input" -> NetEncoder[{"Image", {224, 224}}]];
  Nest[applyGradient[net, #, stepsize] &, img, steps]
  ]

Show[ImagePad[dream[img, #1, #2, #3], 10, White], 
    ImageSize -> {224, 224}] & @@@ {{"pool2", 0.1, 10}, {"pool3", 0.1,
     10}, {"pool4", 0.2, 20}, {"pool5", 0.5, 20}} // Row

enter image description here

The way to generate the deep-dream images can also be used to visualize the convolution layers in the model. To visualize on convolution kernel, we can attach a fully connected layer right after the convolution layer that keeps only one of the filter channel and zero all others.

weights[n_] := List@PadRight[Table[0., n - 1]~Join~{1.}, 1000]
maxAtLayer[img_, n_] := Module[{net, result},
  net = NetChain[{Take[VGG, {"conv1_1", "conv5_3"}], FlattenLayer[], 
     DotPlusLayer[1, "Weights" -> weights[n], "Biases" -> {0}], 
     SummationLayer[]}, 
    "Input" -> NetEncoder[{"Image", {224, 224}}]];
  Nest[applyGradient[net, #, 0.01] &, img, 30];
  result
  ]

img = Image[(20*RandomReal[{0, 1}, {224, 224, 3}] + 128)/255.];

maxAtLayer[img, RandomInteger[{1,512}]]

enter image description here

The same can be applied to the last layer of the network

enter image description here


If you have an inception model, its mostly possible using hidden functionality (but without GPU training). The steps would look like this:

  1. Cut the inception model at some level using Take. Then add a Ramp (or other) positive activation and SummationLayer: DeepDream simply wants to maximize the total amount of neuronal firing at the last layer (the original DeepDream implementation squares the output of the last layer, which isn't possible right now, but will be in 11.1). Using a SummationLayer produces a single output, so this can be used as a loss that can be differentiated.

  2. NetDerivative allows you differentiate networks (buts its hidden functionality and might be modified/replaced). You need this to obtain the derivative of the output with respect to the input

    <<NeuralNetworks`
    chain = NetChain[{ElementwiseLayer[LogisticSigmoid]}]
    NetDerivative[chain]@<|"Input" -> {2, 3, 4}|>
    
  3. Optimize + Jitter: do gradient descent on the input image to find the image that maximizes the total activation of the final layer. But this alone doesn't produce any interesting results (this is a standard problem generating images from discriminative models). The key is to add some regularization, which in the case of DeepDream is random pixel jittering per iteration (see here for more detail + details of the image upscaling and downscaling used to improve results).

I plan to add a DeepDream example to the Mathematica docs once NetDerivative is a system function. And obviously these are just the steps you might follow, not a solution. But I thought it would be interesting for people to know how they might implement this.