How are metrics computed in Keras?

There is a difference between the metric on training dataset and on validation dataset. For the val set the metric is calculated at epoch end for your whole val dataset. For the train set: The metric is calculated on batch end and the average keeps getting updated till epochs end.

As you can see the metric for the train set is evaluated on the fly with each batch was evaluated using different weights. That's why the train metric shows sometimes strange behaviour.


Dennis has already explain this clearly.

One more thing to point out, if you want compute the metric over all train datasets, Or like your custome metric function could just be computed on single pass and no averaging, you could try use the keras callback and define the on_epoch_end, in on_epoch_end method you could compute this on whole train data.

like this :

 def on_epoch_end(self, epoch, logs={}):
     y_pred = self.model.predict(self.X_train, verbose=0)
     score = max_error(self.y_train, y_pred)
     y_val_pred = self.model.predict(self.X_val, verbose=0)
     val_score = max_error(self.y_val, y_val_pred)
     print("\n ROC-AUC - epoch: %d - train score: %.6f \n - val score: %.6f" % (epoch+1, score, val_score))

And you need pass the train data and val data to model.fit's validation_data parameter.


Something additional to know with respect to the metric for the VALIDATION set:

Contrary to what is suggested in another answer, I just saw that the metric on the validation set is calculated in batches, and then averaged (of course the trained model at the end of the epoch is used, in contrast to how the metric score is calculated for the training set).

If you want to compute it on the whole validation data at once, you have to use a callback as described in the accepted answer of guangshengzuo (see https://keras.io/guides/writing_your_own_callbacks/ for more details).

Sure, for the usual metrics, there will not be any difference whether you calculate first in batches and average, or do it all in one big batch. BUT for custom metrics, there very well can be: I just had a case where the metric would tune a parameter, based on the data.

Edit: added link on callbacks, in response to comment