keras flow_from_directory over or undersample a class

With current version of Keras - it's not possible to balance your dataset using only Keras built-in methods. The flow_from_directory is simply building a list of all files and their classes, shuffling it (if need) and then it's iterating over it.

But you could do a different trick - by writting your own generator which would make the balancing inside the python:

def balanced_flow_from_directory(flow_from_directory, options):
    for x, y in flow_from_directory:
         yield custom_balance(x, y, options)

Here custom_balance should be a function that given a batch (x, y) is balancing it and returning a balanced batch (x', y'). For most of the applications the size of the batch doesn't need to be the same - but there are some weird use cases (like e.g. stateful RNNs) - where batch sizes should have a fixed size).


One thing you can do is set the class_weight parameter when calling model.fit() or model.fit_generator().

It also happens that you can easily compute your class_weights using sklearn and numpy libraries as follows:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

Afterwards, it becomes as simple as setting your class_weights equal to class_weight parameter:

model.fit_generator(..., class_weight=class_weights)