Neural Networks: Does Mathematica (v11) experimental code support state-of-art Models?

Mathematica's neural network functionality is based on MXNET. So you can use pre-trained models for MXNET or create and train state-of-the-art models with NetGraph.

For example, pre-trained Inception-V3:

https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-inception-v3.md

URLDownload[
  "http://data.dmlc.ml/mxnet/models/imagenet/inception-v3.tar.gz",
  FileNameJoin[{$UserDocumentsDirectory, "inception-v3.tar.gz"}]
  ];

ExtractArchive["inception-v3.tar.gz"];

Needs["NeuralNetworks`"]

net = NeuralNetworks`ImportMXNetModel[
  "model//Inception-7-symbol.json",
  "model//Inception-7-0001.params"
  ]

enter image description here

Newest 'Xception'-model is not replicable right now. Because MXNET doesn't have SeparableConv2D and GlobalAveragePooling2D layers. Even in the Keras SeparableConv2D layer is available only with the TensorFlow backend. Global(Average|Max)Pooling exists in MXNET but not realized in Mathematica.

UPDATE

Since the V11.1 we can use AggregationLayer for global pooling.

SeparableConv2D can be built from the other layers.

n = 128; h = 3; w = 3; depth = 2;

NetChain[
 {
  ReplicateLayer[1],
  TransposeLayer[],
  NetMapOperator[ConvolutionLayer[depth, {h, w}]],
  FlattenLayer[1],
  ConvolutionLayer[n, {1, 1}]
  },
 "Input" -> {32, 9, 9}
 ]

enter image description here


Bring in pre-trained models is sometimes very useful. Alexey's answer is somewhat brief, here I'm trying to add some examples hopefully will be helpful.

We can load the trained network by

net = NeuralNetworks`ImportMXNetModel[
  "model/Inception-7-symbol.json",
  "model/Inception-7-0001.params"
  ]

and attach the final softmax layer to calculate the probabilities in each class:

net2 = NetGraph[{net, SoftmaxLayer[]}, {1 -> 2}, 
  "Input" -> NetEncoder[{"Image", {299, 299}, ColorSpace -> "RGB"}], 
  "Output" -> NetDecoder[{"Class", Range[1008]}]]

The prediction label/text mapping is in the file synset.txt:

labels = Import["model/synset.txt", "Table"]

We can then use the inception network to identify images. For example

imgs = EntityValue[#, "Image"] & /@ {Entity["Species", 
    "Infraspecies:CanisLupusFamiliaris"], 
   Entity["Species", "Species:FelisCatus"], 
   Entity["Species", "Species:PantheraTigris"], 
   Entity["Species", "Genus:Macropus"]}

enter image description here

and the labels are identified fairly accurately

labels[[net2[ImageResize[#, {299, 299}]]]] & /@ imgs
(* {{"n02099601", "golden", "retriever"}, 
    {"n02127052", "lynx,", "catamount"},
    {"n02129604", "tiger,", "Panthera", "tigris"},
    {"n01877812", "wallaby,", "brush", "kangaroo"}} *)

We can also try to visualize the weights in its layers. For example, here are the weights the one channel of the first convolution layer:

weight = NetExtract[net, {1, "Weights"}];
ImageCollage[
 Table[ImageAdjust[
   Image[weight[[n, All, All]], ColorSpace -> "RGB"]], {n, 1, 32}], 
 ImagePadding -> 1]

enter image description here

And we can see what these convolution filters do to the input image:

conv = NetChain[{NetExtract[net, 1]}, 
  "Input" -> NetEncoder[{"Image", {299, 299}, ColorSpace -> "RGB"}]];
data = conv@ImageResize[#, {299, 299}] &@imgs[[1]];
ImageCollage[ImageAdjust@Image[#] & /@ data]

enter image description here

We can also use Take to cut the inception model at some layer, and visualize the propagated input image at that layer:

layers = Take[net, {"conv_conv2d", "mixed_2_tower_1_conv_1_conv2d"}] 

enter image description here

NetChain[{layers}, 
 "Input" -> NetEncoder[{"Image", {299, 299}, ColorSpace -> "RGB"}]]
data2 = net3@ImageResize[#, {299, 299}] &@imgs[[1]];
ImageCollage[ImageAdjust@Image[#] & /@ data2, ImagePadding -> 1]

enter image description here


XCEPTION

https://arxiv.org/abs/1610.02357 enter image description here

entry = NetGraph[
  <|
   "conv_1" -> ConvolutionLayer[32, {3, 3}, "Stride" -> 2],
   "relu_1" -> Ramp,

   "conv_2" -> ConvolutionLayer[64, {3, 3}],
   "relu_2" -> Ramp,

   "resid_1" -> ConvolutionLayer[128, {1, 1}, "Stride" -> 2],

   "sep_conv_1" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[128, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_3" -> Ramp,
   "sep_conv_2" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[128, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "max_pool_1" -> 
    PoolingLayer[{3, 3}, "Stride" -> 2, "PaddingSize" -> 1],

   "add_1" -> ThreadingLayer[Plus],

   "resid_2" -> ConvolutionLayer[256, {1, 1}, "Stride" -> 2],

   "relu_4" -> Ramp,
   "sep_conv_3" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[256, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_5" -> Ramp,
   "sep_conv_4" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[256, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "max_pool_2" -> 
    PoolingLayer[{3, 3}, "Stride" -> 2, "PaddingSize" -> 1],

   "add_2" -> ThreadingLayer[Plus],

   "resid_3" -> ConvolutionLayer[728, {1, 1}, "Stride" -> 2],

   "relu_6" -> Ramp,
   "sep_conv_5" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_7" -> Ramp,
   "sep_conv_6" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "max_pool_3" -> 
    PoolingLayer[{3, 3}, "Stride" -> 2, "PaddingSize" -> 1],

   "add_3" -> ThreadingLayer[Plus]
   |>
  ,
  {
   NetPort["Input"] -> 
    "conv_1" -> 
     "relu_1" -> "conv_2" -> "relu_2" -> "resid_1" -> "add_1",
   "relu_2" -> 
    "sep_conv_1" -> 
     "relu_3" -> "sep_conv_2" -> "max_pool_1" -> "add_1",
   "add_1" -> "resid_2" -> "add_2",
   "add_1" -> 
    "relu_4" -> 
     "sep_conv_3" -> 
      "relu_5" -> "sep_conv_4" -> "max_pool_2" -> "add_2",
   "add_2" -> "resid_3" -> "add_3",
   "add_2" -> 
    "relu_6" -> 
     "sep_conv_5" -> 
      "relu_7" -> "sep_conv_6" -> "max_pool_3" -> "add_3"
   }
  ,
  "Input" -> NetEncoder[{"Image", {299, 299}, ColorSpace -> "RGB"}]
  ]

enter image description here

middle = NetGraph[
  <|
   "relu_1" -> Ramp,
   "sep_conv_1" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_2" -> Ramp,
   "sep_conv_2" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_3" -> Ramp,
   "sep_conv_3" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "add" -> ThreadingLayer[Plus]
   |>
  ,
  {
   NetPort["Input"] -> 
    "relu_1" -> 
     "sep_conv_1" -> 
      "relu_2" -> "sep_conv_2" -> "relu_3" -> "sep_conv_3" -> "add",
   NetPort["Input"] -> "add"
   }
  ,
  "Input" -> {728, 19, 19}
  ]

enter image description here

exit = NetGraph[
  <|
   "resid" -> ConvolutionLayer[1024, {1, 1}, "Stride" -> 2],

   "relu_1" -> Ramp,
   "sep_conv_1" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[728, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "relu_2" -> Ramp,
   "sep_conv_2" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[1024, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],

   "max_pool" -> 
    PoolingLayer[{3, 3}, "Stride" -> 2, "PaddingSize" -> 1],

   "add" -> ThreadingLayer[Plus],

   "sep_conv_3" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[1536, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],
   "relu_3" -> Ramp,

   "sep_conv_4" ->
    NetGraph[
     {
      ReplicateLayer[1],
      TransposeLayer[],
      NetMapOperator[ConvolutionLayer[1, {3, 3}, "PaddingSize" -> 1]],
      FlattenLayer[1],
      ConvolutionLayer[2048, {1, 1}]
      },
     {1 -> 2 -> 3 -> 4 -> 5}
     ],
   "relu_4" -> Ramp,

   "global_pool" -> AggregationLayer[Mean],

   "softmax" -> {2048, SoftmaxLayer[]} 
   |>
  ,
  {
   NetPort["Input"] -> "resid" -> "add",
   NetPort["Input"] -> 
    "relu_1" -> 
     "sep_conv_1" -> "relu_2" -> "sep_conv_2" -> "max_pool" -> "add",
   "add" -> 
    "sep_conv_3" -> 
     "relu_3" -> "sep_conv_4" -> "relu_4" -> "global_pool" -> "softmax"
   }
  ,
  "Input" -> {728, 19, 19}, 
  "Output" -> NetDecoder[{"Class", Range[2048]}]
  ]

enter image description here

xception = NetChain[
  <|
   "entry_flow" -> entry,
   "middle_flow" -> NetNestOperator[middle, 8],
   "exit_flow" -> exit
   |>
  ]

enter image description here