What would be a good loss function to penalize the magnitude and sign difference

From what I understand, your current loss function is something like:

loss = mean_square_error(y, y_pred)

What you could do, is to add one other component to your loss, being this a component that penalizes negative numbers and does nothing with positive numbers. And you can choose a coefficient for how much you want to penalize it. For that, we can use like a negative shaped ReLU. Something like this:

NegRElU

Let's call "Neg_ReLU" to this component. Then, your loss function will be:

loss = mean_squared_error(y, y_pred) + Neg_ReLU(y_pred)

So for example, if your result is -1, then the total error would be:

mean_squared_error(1, -1) + 1

And if your result is 3, then the total error would be:

mean_squared_error(1, -1) + 0

(See in the above function how Neg_ReLU(3) = 0, and Neg_ReLU(-1) = 1.

If you want to penalize more the negative values, then you can add a coefficient:

coeff_negative_value = 2

loss = mean_squared_error(y, y_pred) + coeff_negative_value * Neg_ReLU

enter image description here

Now the negative values are more penalized.

The ReLU negative function we can build it like this:

tf.nn.relu(tf.math.negative(value))


So summarizing, in the end your total loss will be:

coeff = 1

Neg_ReLU = tf.nn.relu(tf.math.negative(y))

total_loss = mean_squared_error(y, y_pred) + coeff * Neg_ReLU

This turned out to be a really interesting question - thanks for asking it! First, remember that you want your loss functions to be defined entirely of differential operations, so that you can back-propagation though it. This means that any old arbitrary logic won't necessarily do. To restate your problem: you want to find a differentiable function of two variables that increases sharply when the two variables take on values of different signs, and more slowly when they share the same sign. Additionally, you want some control over how sharply these values increase, relative to one another. Thus, we want something with two configurable constants. I started constructing a function that met these needs, but then remembered one you can find in any high school geometry text book: the elliptic paraboloid!

A rotated elliptic paraboloid.

The standard formulation doesn't meet the requirement of sign agreement symmetry, so I had to introduce a rotation. The plot above is the result. Note that it increases more sharply when the signs don't agree, and less sharply when they do, and that the input constants controlling this behaviour are configurable. The code below is all that was needed to define and plot the loss function. I don't think I've ever used a geometric form as a loss function before - really neat.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm


def elliptic_paraboloid_loss(x, y, c_diff_sign, c_same_sign):

    # Compute a rotated elliptic parabaloid.
    t = np.pi / 4

    x_rot = (x * np.cos(t)) + (y * np.sin(t))

    y_rot = (x * -np.sin(t)) + (y * np.cos(t))

    z = ((x_rot**2) / c_diff_sign) + ((y_rot**2) / c_same_sign)

    return(z)


c_diff_sign = 4

c_same_sign = 2

a = np.arange(-5, 5, 0.1)

b = np.arange(-5, 5, 0.1)

loss_map = np.zeros((len(a), len(b)))

for i, a_i in enumerate(a):

    for j, b_j in enumerate(b):

        loss_map[i, j] = elliptic_paraboloid_loss(a_i, b_j, c_diff_sign, c_same_sign)



fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(a, b)
surf = ax.plot_surface(X, Y, loss_map, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False)

plt.show()