how to take ImageFeatures?

Some introduction

Feature extraction is a very important idea in machine learning. It allows us to build a neural network based on some of the pretrained high-performance networks. This is also the idea behind transfer learning. I believe this is also the reason why Classify performs so well even on a very small dataset.

A typical convolution neural network usually consists of two parts: A feature extraction backbone and a classification (or regression) head. The feature extraction part usually consists of may convolution and pooling layers, while the classification head consists of fully-connected layers.

enter image description here

The feature extraction part works on an input image and produces a vector representation of this image. This vector representation usually captures the semantic meaning of the image and is independent of the details pixels values in the input image. Usually, this feature extraction process is more or less independent of the classification problems, and therefore can be shared among different problems. For example, we may use the feature extraction part of a neural network trained to recognize desk and chair to help us solve the problem of recognizing car and people.

An Example

Here is a simple example using MNIST dataset. In this example, we use the trained network on data for numbers {0,1,2,3,4} as a feature extractor on data for numbers {5,6,7,8,9}.

We divide the data into two parts:

resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];

trainingData1 = Select[trainingData, Values[#] <= 4 &];
testData1 = Select[testData, Values[#] <= 4 &];
trainingData2 = Select[trainingData, Values[#] >= 5 &];
testData2 = Select[testData, Values[#] >= 5 &];

We use the LeNet structure

lenet = NetChain[{
   ConvolutionLayer[20, 5], Ramp, PoolingLayer[2, 2],
   ConvolutionLayer[50, 5], Ramp, PoolingLayer[2, 2],
   FlattenLayer[], 500, Ramp, 10, SoftmaxLayer[]},
  "Output" -> NetDecoder[{"Class", Range[0, 9]}],
  "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]
  ]

enter image description here

In this network structure, we can think the first 7 layers are the feature extraction part, and the last 4 layers as the classification head.

We now train the network on the data contains only examples {0,1,2,3,4}

trained1 = 
 NetTrain[lenet, trainingData1, ValidationSet -> testData1, 
  MaxTrainingRounds -> 3]

By training three rounds, we get an accuracy of 0.99747.

ClassifierMeasurements[trained1, testData1, "Accuracy"]
(* 0.99747 *)

Now we can use the feature extraction part (layer 1 to 7) of this trained network as a feature extractor.

fe = Take[trained1, {1, 7}][#] &;

Visualize extracted features

First, let's see what the feature extractor does. Apply to a single input image, we are getting a vector of 800 numbers.

Short[fe[trainingData[[1, 1]]]]
(*{0.32668,0.318409,0.606069,0.789559,0.293705,0.,<<788>>,0.887117,1.25466,0.,0.656648,1.28119,0.203971}*)

However, it's difficult to see what does this vector mean. In order to see more clearly, we can apply this feature extractor to many images. We expect to see that the vectors come from the same class will look similar, and vectors from different classes will look different. Here we plot the extracted 1000 vectors from 1000 examples. For each vector, we plot it vertically as 800X1 array.

Table[ArrayPlot[
   Transpose[
    fe[GroupBy[RandomSample[trainingData, 2000], Values][n][[All, 
      1]]]], ImageSize -> Medium, PlotLabel -> n], {n, 0, 9}] // Row

We can see clearly that different classes (different number) have distinct features in their vector values. These distinct features will be the starting point for the classification at end the neural network.

enter image description here

Use feature extractor

Once we have the feature extractor, we can use it in classify new data. We can either do this explicitly by running our new dataset through it and then classify on the feature vectors, or we can do this implicitly in a neural network by fixing the weights of these layers while retrain the fully-connected layers on the new data (data for numbers {5,6,7,8,9}).

Here is the neural network approach, we train the fully-connected layers while fixing the weights in the convolution layers

trained2 = 
 NetTrain[lenet, trainingData2, 
  LearningRateMultipliers -> {8 ;; 10 -> 1, _ -> None}, 
  ValidationSet -> testData2, MaxTrainingRounds -> 1]

This training is faster since we only need to backpropagate a few layers at the end of the network. We see that by training just one round, we already get a fairly good accuracy.

ClassifierMeasurements[trained2, testData2, "Accuracy"]
(* 0.947542 *)

In the explicit approach, we can simply use the extracted features to do a logistic regression

cl = Classify[RandomSample[trainingData2, 1000], 
  FeatureExtractor -> (Take[trained1, {1, 7}][#] &)]

This also gives us a fairly good accuracy.

ClassifierMeasurements[cl, RandomSample[testData2, 100], "Accuracy"]
(* 0.96 *)

The exact workings of the feature extractors are known only to their authors. One of the authors did, however, write about the extractor that you mention. In a blog post he said that

This extractor is a byproduct of our effort to develop ImageIdentify. In a nutshell, we took the network trained for ImageIdentify and removed its last layers. The resulting network transforms images into feature vectors encoding high-level concepts. Thanks to the large and diverse dataset (about 10 million images and 10,000 classes) used to train the network, this simple strategy gives a pretty good extractor even for objects that were not in the dataset (such as griffins, centaurs and unicorns).


You can also use the image classification networks in NetModel, and then drop the SoftmaxLayer and LinearLayer using Take or Drop. Flattening the output tensor of this modified net gives you an image to vector feature extractor. This is very similar to what FeatureExtract does, but you have more control, as you can choose any number of layers to drop, and you can choose any image net in NetModel.