Getting some form of keras multi-processing/threading to work on Windows

In combination with a sequence, using multi_processing=False and workers=e.g. 4 does work.

I just realized that in the example code in the question, I was not seeing the speed-up, because the data was being generated too fast. By inserting a time.sleep(2) this becomes evident.

class DummySequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
    self.x, self.y = x_set, y_set
    self.batch_size = batch_size
def __len__(self):
    return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):        
    batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
    batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
    time.sleep(2)
    return np.array(batch_x), np.array(batch_y)

x = np.random.random((100, 3))
y = to_categorical(np.random.random(100) > .5).astype(int)

seq = DummySequence(x, y, 10)

model = Sequential()
model.add(Dense(32, input_dim=3))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print('single worker')
model.fit_generator(generator=seq, 
                    steps_per_epoch = 10,
                    epochs = 2, 
                    verbose=2,
                    workers=1)

print('achieves speed-up!')
model.fit_generator(generator=seq, 
                    steps_per_epoch = 10,
                    epochs = 2, 
                    verbose=2,
                    workers=4,
                    use_multiprocessing=False)

This produced on my laptop the following:

single worker
>>> model.fit_generator(generator=seq,
...                     steps_per_epoch = 10,
...                     epochs = 2,
...                     verbose=2,
...                     workers=1)
Epoch 1/2
 - 20s - loss: 0.6984 - acc: 0.5000
Epoch 2/2
 - 20s - loss: 0.6955 - acc: 0.5100

and

achieves speed-up!
>>> model.fit_generator(generator=seq,
...                     steps_per_epoch = 10,
...                     epochs = 2,
...                     verbose=2,
...                     workers=4,
...                     use_multiprocessing=False)
Epoch 1/2
 - 6s - loss: 0.6904 - acc: 0.5200
Epoch 2/2
 - 6s - loss: 0.6900 - acc: 0.5000

Important notes: You will probably want self.lock = threading.Lock() in __init___ and then with self.lock: in __getitem__. Try to do the absolute bare minimum required within the with self.lock:, as far as I understand it, that would be any reference to self.xxxx (multi-threading is prevent while the with self.lock: block is running).

Additionally, if you want multithreading to speed up calculations (i.e. CPU operations are the limit), do not expect any speed-up. The global-interpreter lock (GIL) will prevent that. Multithreading will only help you, if the limitation is in I/O operations. Apparently, to speed-up CPU computations we need true multiprocessing, which keras currently does not support on Windows 10. Perhaps it is possible to hand-craft a multi-processing generator (I have no idea).