Is it ok to define your own cost function for logistic regression?

The logistic function, hinge-loss, smoothed hinge-loss, etc. are used because they are upper bounds on the zero-one binary classification loss.

These functions generally also penalize examples that are correctly classified but are still near the decision boundary, thus creating a "margin."

So, if you are doing binary classification, then you should certainly choose a standard loss function.

If you are trying to solve a different problem, then a different loss function will likely perform better.

Yes, you can define your own loss function, but if you're a novice, you're probably better off using one from the literature. There are conditions that loss functions should meet:

They should approximate the actual loss you're trying to minimize. As was said in the other answer, the standard loss functions for classification is zero-one-loss (misclassification rate) and the ones used for training classifiers are approximations of that loss.

The squared-error loss from linear regression isn't used because it doesn't approximate zero-one-loss well: when your model predicts +50 for some sample while the intended answer was +1 (positive class), the prediction is on the correct side of the decision boundary so the zero-one-loss is zero, but the squared-error loss is still 49² = 2401. Some training algorithms will waste a lot of time getting predictions very close to {-1, +1} instead of focusing on getting just the sign/class label right.(*)
The loss function should work with your intended optimization algorithm. That's why zero-one-loss is not used directly: it doesn't work with gradient-based optimization methods since it doesn't have a well-defined gradient (or even a subgradient, like the hinge loss for SVMs has).

The main algorithm that optimizes the zero-one-loss directly is the old perceptron algorithm.

Also, when you plug in a custom loss function, you're no longer building a logistic regression model but some other kind of linear classifier.

(*) Squared error is used with linear discriminant analysis, but that's usually solved in close form instead of iteratively.

Is it ok to define your own cost function for logistic regression?

Tags:

Machine Learning

Data Mining

Regression

Related

Recent Posts