Confusion matrix on images in CNN keras

Why would the scikit-learn function not do the job? You forward pass all your samples (images) in the train/test set, convert one-hot-encoding to label encoding (see link) and pass it into sklearn.metrics.confusion_matrix as y_pred. You proceed in a similar fashion with y_true (one-hot to label).

Sample code:

import sklearn.metrics as metrics

y_pred_ohe = KerasClassifier.predict(X)  # shape=(n_samples, 12)
y_pred_labels = np.argmax(y_pred_ohe, axis=1)  # only necessary if output has one-hot-encoding, shape=(n_samples)

confusion_matrix = metrics.confusion_matrix(y_true=y_true_labels, y_pred=y_pred_labels)  # shape=(12, 12)

Here's how to get the confusion matrix(or maybe statistics using scikit-learn) for all classes:

1.Predict classes

test_generator = ImageDataGenerator()
test_data_generator = test_generator.flow_from_directory(
    test_data_path, # Put your path here
     target_size=(img_width, img_height),
    batch_size=32,
    shuffle=False)
test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)

predictions = model.predict_generator(test_data_generator, steps=test_steps_per_epoch)
# Get most likely class
predicted_classes = numpy.argmax(predictions, axis=1)

2.Get ground-truth classes and class-labels

true_classes = test_data_generator.classes
class_labels = list(test_data_generator.class_indices.keys())   

3. Use scikit-learn to get statistics

report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
print(report)    

You can read more here

EDIT: If the above does not work, have a look at this video Create confusion matrix for predictions from Keras model. Probably look through the comments if you have an issue. Or Make predictions with a Keras CNN Image Classifier


Here cats and dogs are the class labels:

#Confusion Matrix and Classification Report
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

Y_pred = model.predict_generator(validation_generator, nb_validation_samples // 
batch_size+1)
y_pred = np.argmax(Y_pred, axis=1)

print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))

print('Classification Report')
target_names = ['Cats', 'Dogs']
print(classification_report(validation_generator.classes, y_pred, target_names=target_names))