Quantize a Keras neural network model

As your network looks quite simple, you can maybe use Tensorflow lite.


As mentioned in other answers, TensorFlow Lite can help you with network quantization.

TensorFlow Lite provides several levels of support for quantization.

Tensorflow Lite post-training quantization quantizes weights and activations post training easily. Quantization-aware training allows for training of networks that can be quantized with minimal accuracy drop; this is only available for a subset of convolutional neural network architectures.

So first, you need to decide whether you need post-training quantization or quantization-aware training. For example, if you already saved the model as *.h5 files, you would probably want to follow @Mitiku's instruction and do the post-training quantization.

If you prefer to achieve higher performance by simulating the effect of quantization in training (using the method you quoted in the question), and your model is in the subset of CNN architecture supported by quantization-aware training, this example may help you in terms of interaction between Keras and TensorFlow. Basically, you only need to add this code between model definition and its fitting:

sess = tf.keras.backend.get_session()
tf.contrib.quantize.create_training_graph(sess.graph)
sess.run(tf.global_variables_initializer())