roc_auc_score - Only one class present in y_true

As the error notes, if a class is not present in the ground truth of a batch,

ROC AUC score is not defined in that case.

I'm against either throwing an exception (about what? This is the expected behaviour) or returning another metric (e.g. accuracy). The metric is not broken per se.

I don't feel like solving a data imbalance "issue" with a metric "fix". It would probably be better to use another sampling, if possibile, or just join multiple batches that satisfy the class population requirement.

I am facing the same problem now, and using try-catch does not solve my issue. I developed the code below in order to deal with that.

import pandas as pd
import numpy as np

class KFold(object):

    def __init__(self, folds, random_state=None):

        self.folds = folds

        self.random_state = random_state

    def split(self, x, y):

        assert len(x) == len(y), 'x and y should have the same length'

        x_, y_ = pd.DataFrame(x), pd.DataFrame(y)

        y_ = y_.sample(frac=1, random_state=self.random_state)

        x_ = x_.loc[y_.index]

        event_index, non_event_index = list(y_[y == 1].index), list(y_[y == 0].index)

        assert len(event_index) >= self.folds, 'number of folds should be less than the number of rows in x'

        assert len(non_event_index) >= self.folds, 'number of folds should be less than number of rows in y'

        indexes = []

        #
        #
        #
        step = int(np.ceil(len(non_event_index) / self.folds))

        start, end = 0, step

        while start < len(non_event_index):

            train_fold = set(non_event_index[start:end])

            valid_fold = set([k for k in non_event_index if k not in train_fold])

            indexes.append([train_fold, valid_fold])

            start, end = end, min(step + end, len(non_event_index))


        #
        #
        #
        step = int(np.ceil(len(event_index) / self.folds))

        start, end, i = 0, step, 0

        while start < len(event_index):

            train_fold = set(event_index[start:end])

            valid_fold = set([k for k in event_index if k not in train_fold])

            indexes[i][0] = list(indexes[i][0].union(train_fold))

            indexes[i][1] = list(indexes[i][1].union(valid_fold))

            indexes[i] = tuple(indexes[i])

            start, end, i = end, min(step + end, len(event_index)), i + 1

        return indexes

I just wrote that code and I did not tested it exhaustively. It was tested only for binary categories. Hope it be useful yet.

You could use try-except to prevent the error:

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
try:
    roc_auc_score(y_true, y_scores)
except ValueError:
    pass

Now you can also set the roc_auc_score to be zero if there is only one class present. However, I wouldn't do this. I guess your test data is highly unbalanced. I would suggest to use stratified K-fold instead so that you at least have both classes present.

roc_auc_score - Only one class present in y_true

Tags:

Python

Auc

Scikit Learn

Related

Recent Posts