# Confusion about keras Model: __call__ vs. call vs. predict methods

Adding to @Dmitry Kabanov, they are similar yet they aren't exactly the same thing. If you care about performance, need to look in to critical differences between them.

model.predict | model(x) |
---|---|

loops over the data in batches which means means that predict() calls can scale to very large arrays. | happens in-memory and doesn't scale |

not differentiable | differentiable |

use this if you just need the output value | use this when you need to retrieve the gradients |

Output is NumPy value | Output is a Tensor |

use this if you have batches of data to be predicted | use this for small dataset |

relatively slower for small data | relatively faster for small data |

Please check more detailed explanation in Keras FAQs

Just to complement the answer as I was also searching for this. When you need to specify the training flag of the model for the inference phase, such as, `model(X_new, training=False)`

when you have a batch normalization layer, for example, both `predict`

and `predict_on_batch`

already do that when they are executed.

So, `model(X_new, training=False)`

and `model.predict_on_batch(X_new)`

are equivalent.

The difference between `predict`

and `predict_on_batch`

is that the latter runs over a single batch, and the former runs over a dataset that is splitted into batches and the results merged to produce the final numpy array of predictions.

Beyond the difference mentioned by @Dmitry Kabanov, the functions generate different types of output,
`__call__`

generates a Tensor, and `predict`

and `predict_on_batch`

generate `numpy.ndarray`

, and
according to the documentation, `__call__`

is faster than the `predict`

function for small scale inputs, i.e., which fit in one batch.