How to use .predict_generator() on new Images - Keras

I had some trouble with predict_generator(). Some posts here helped a lot. I post my solution here as well and hope it will help others. What I do:

  • Make predictions on new images using predict_generator()
  • Get filename for each prediction
  • Store results in a data frame

I make binary predictions à la "cats and dogs" as documented here. However, the logic can be generalised to multiclass cases. In this case the outcome of the prediction has one column per class.

First, I load my stored model and set up the data generator:

import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model

# Load model
model = load_model('my_model_01.hdf5')

test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
        "C:/kerasimages/pred/",
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary',
        shuffle=False)

Note: it is important to specify shuffle=False in order to preserve the order of filenames and predictions.

Images are stored in C:/kerasimages/pred/images/. The data generator will only look for images in subfolders of C:/kerasimages/pred/ (as specified in test_generator). It is important to respect the logic of the data generator, so the subfolder /images/ is required. Each subfolder in C:/kerasimages/pred/ is interpreted as one class by the generator. Here, the generator will report Found x images belonging to 1 classes (since there is only one subfolder). If we make predictions, classes (as detected by the generator) are not relevant.

Now, I can make predictions using the generator:

# Predict from generator (returns probabilities)
pred=model.predict_generator(test_generator, steps=len(test_generator), verbose=1)

Resetting the generator is not required in this case, but if a generator has been set up before, it may be necessary to rest it using test_generator.reset().

Next I round probabilities to get classes and I retrieve filenames:

# Get classes by np.round
cl = np.round(pred)
# Get filenames (set shuffle=false in generator is important)
filenames=test_generator.filenames

Finally, results can be stored in a data frame:

# Data frame
results=pd.DataFrame({"file":filenames,"pr":pred[:,0], "class":cl[:,0]})

So first of all the test images should be placed inside a separate folder inside the test folder. So in my case I made another folder inside test folder and named it all_classes. Then ran the following code:

test_generator = test_datagen.flow_from_directory(
    directory=pred_dir,
    target_size=(28, 28),
    color_mode="rgb",
    batch_size=32,
    class_mode=None,
    shuffle=False
)

The above code gives me an output:

Found 306 images belonging to 1 class

And most importantly you've to write the following code:

test_generator.reset()

else weird outputs will come. Then using the .predict_generator() function:

pred=cnn.predict_generator(test_generator,verbose=1,steps=306/batch_size)

Running the above code will give output in probabilities so at first I need to convert them to class number. In my case it was 4 classes, so class numbers were 0,1,2 and 3.

Code written:

predicted_class_indices=np.argmax(pred,axis=1)

Next step is I want the name of the classes:

labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]

Where by class numbers will be replaced by the class names. One final step if you want to save it to a csv file, arrange it in a dataframe with the image names appended with the class predicted.

filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
                      "Predictions":predictions})

Display your dataframe. Everything is done now. You get all the predicted class for your images.