What is the purpose of the add_loss function in Keras?

I was also wondering about the same query and some related stuff like how to add loss function within the intermediate layers. Here I'm sharing some of the observed information, hope it may help others. It's true that standard keras loss functions only take two arguments, y_true and y_pred. But during the experiment, there can some cases where we need some external parameter or coefficient while computing with these two values (y_true, y_pred). This can be needed at the last layer as usual or somewhere in the middle of the model's layer.

model.add_loss()

The accepted answer correctly said about the model.add_loss() functions. It potentially depends on the layer inputs (tensor). According to the official doc, when writing the call method of a custom layer or a subclassed model, we may want to compute scalar quantities that we want to minimize during training (e.g. regularization losses). We can use the add_loss() layer method to keep track of such loss terms. For instance, activity regularization losses dependent on the inputs passed when calling a layer. Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:

from tensorflow.keras.layers import Layer

class MyActivityRegularizer(Layer):
  """Layer that creates an activity sparsity regularization loss."""

  def __init__(self, rate=1e-2):
    super(MyActivityRegularizer, self).__init__()
    self.rate = rate

  def call(self, inputs):
    # We use `add_loss` to create a regularization loss
    # that depends on the inputs.
    self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
    return inputs

Loss values added via add_loss can be retrieved in the .losses list property of any Layer or Model (they are recursively retrieved from every underlying layer):

from tensorflow.keras import layers

class SparseMLP(Layer):
  """Stack of Linear layers with a sparsity regularization loss."""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = MyActivityRegularizer(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)


mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # List containing one float32 scalar

Also note, when using model.fit(), such loss terms are handled automatically. When writing a custom training loop, we should retrieve these terms by hand from model.losses, like this:

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset.
for x, y in dataset:
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model(x)
        # Loss value for this batch.
        loss_value = loss_fn(y, logits)
        # Add extra loss terms to the loss value.
        loss_value += sum(model.losses) # < ------------- HERE ---------

    # Update the weights of the model to minimize the loss value.
    gradients = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

Custom losses

With model.add_loss(), (AFAIK), we can use it somewhere in the middle of the network. Here we no longer bound with only two parameters i.e. y_true, y_pred. But what if we also want to impute external parameter or coefficient to the last layer loss functions of the network. Nric answer is correct. But it can also be implemented by subclassing the tf.keras.losses.Loss class by implementing the following two methods:

  • __init__(self): accept parameters to pass during the call of your loss function
  • call(self, y_true, y_pred): use the targets (y_true) and the model predictions (y_pred) to compute the model's loss

Here is an example of a custom MSE by subclassing the tf.keras.losses.Loss class. And here we also no longer bound only two parameters i.e. y_ture, y_pred.

class CustomMSE(keras.losses.Loss):
    def __init__(self, regularization_factor=0.1, name="custom_mse"):
        super().__init__(name=name)
        self.regularization_factor = regularization_factor

    def call(self, y_true, y_pred):
        mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
        reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
        return mse + reg * self.regularization_factor

model.compile(optimizer=..., loss=CustomMSE())

I'll try to answer the original question of why model.add_loss() is being used instead of specifying a custom loss function to model.compile(loss=...).

All loss functions in Keras always take two parameters y_true and y_pred. Have a look at the definition of the various standard loss functions available in Keras, they all have these two parameters. They are the 'targets' (the Y variable in many textbooks) and the actual output of the model. Most standard loss functions can be written as an expression of these two tensors. But some more complex losses cannot be written in that way. For your VAE example this is the case because the loss function also depends on additional tensors, namely z_log_var and z_mean, which are not available to the loss functions. Using model.add_loss() has no such restriction and allows you to write much more complex losses that depend on many other tensors, but it has the inconvenience of being more dependent on the model, whereas the standard loss functions work with just any model.

(Note: The code proposed in other answers here are somewhat cheating in as much as they just use global variables to sneak in the additional required dependencies. This makes the loss function not a true function in the mathematical sense. I consider this to be much less clean code and I expect it to be more error-prone.)


JIH's answer is right of course but maybe it is useful to add:

model.add_loss() has no restrictions, but it also removes the comfort of using for example targets in the model.fit().

If you have a loss that depends on additional parameters of the model, of other models or external variables, you can still use a Keras type encapsulated loss function by having an encapsulating function where you pass all the additional parameters:

def loss_carrier(extra_param1, extra_param2):
    def loss(y_true, y_pred):
        #x = complicated math involving extra_param1, extraparam2, y_true, y_pred
        #remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
        #also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
        #you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
        #and have them defind as keras.Input or tf.placeholder with the right shape.
        return x
    return loss

model.compile(optimizer='adam', loss=loss_carrier)

The trick is the last row where you return a function as Keras expects them with just two parameters y_true and y_pred.

Possibly looks more complicated than the model.add_loss version, but the loss stays modular.