Confusion about keras Model: __call__ vs. call vs. predict methods

Adding to @Dmitry Kabanov, they are similar yet they aren't exactly the same thing. If you care about performance, need to look in to critical differences between them.

model.predict model(x)
loops over the data in batches which means means that predict() calls can scale to very large arrays. happens in-memory and doesn't scale
not differentiable differentiable
use this if you just need the output value use this when you need to retrieve the gradients
Output is NumPy value Output is a Tensor
use this if you have batches of data to be predicted use this for small dataset
relatively slower for small data relatively faster for small data

Please check more detailed explanation in Keras FAQs

Just to complement the answer as I was also searching for this. When you need to specify the training flag of the model for the inference phase, such as, model(X_new, training=False) when you have a batch normalization layer, for example, both predict and predict_on_batch already do that when they are executed.

So, model(X_new, training=False) and model.predict_on_batch(X_new) are equivalent.

The difference between predict and predict_on_batch is that the latter runs over a single batch, and the former runs over a dataset that is splitted into batches and the results merged to produce the final numpy array of predictions.

Beyond the difference mentioned by @Dmitry Kabanov, the functions generate different types of output, __call__ generates a Tensor, and predict and predict_on_batch generate numpy.ndarray, and according to the documentation, __call__ is faster than the predict function for small scale inputs, i.e., which fit in one batch.