roc_auc_score - Only one class present in y_true
As the error notes, if a class is not present in the ground truth of a batch,
ROC AUC score is not defined in that case.
I'm against either throwing an exception (about what? This is the expected behaviour) or returning another metric (e.g. accuracy). The metric is not broken per se.
I don't feel like solving a data imbalance "issue" with a metric "fix". It would probably be better to use another sampling, if possibile, or just join multiple batches that satisfy the class population requirement.
I am facing the same problem now, and using
try-catch does not solve my issue. I developed the code below in order to deal with that.
import pandas as pd import numpy as np class KFold(object): def __init__(self, folds, random_state=None): self.folds = folds self.random_state = random_state def split(self, x, y): assert len(x) == len(y), 'x and y should have the same length' x_, y_ = pd.DataFrame(x), pd.DataFrame(y) y_ = y_.sample(frac=1, random_state=self.random_state) x_ = x_.loc[y_.index] event_index, non_event_index = list(y_[y == 1].index), list(y_[y == 0].index) assert len(event_index) >= self.folds, 'number of folds should be less than the number of rows in x' assert len(non_event_index) >= self.folds, 'number of folds should be less than number of rows in y' indexes =  # # # step = int(np.ceil(len(non_event_index) / self.folds)) start, end = 0, step while start < len(non_event_index): train_fold = set(non_event_index[start:end]) valid_fold = set([k for k in non_event_index if k not in train_fold]) indexes.append([train_fold, valid_fold]) start, end = end, min(step + end, len(non_event_index)) # # # step = int(np.ceil(len(event_index) / self.folds)) start, end, i = 0, step, 0 while start < len(event_index): train_fold = set(event_index[start:end]) valid_fold = set([k for k in event_index if k not in train_fold]) indexes[i] = list(indexes[i].union(train_fold)) indexes[i] = list(indexes[i].union(valid_fold)) indexes[i] = tuple(indexes[i]) start, end, i = end, min(step + end, len(event_index)), i + 1 return indexes
I just wrote that code and I did not tested it exhaustively. It was tested only for binary categories. Hope it be useful yet.
You could use try-except to prevent the error:
import numpy as np from sklearn.metrics import roc_auc_score y_true = np.array([0, 0, 0, 0]) y_scores = np.array([1, 0, 0, 0]) try: roc_auc_score(y_true, y_scores) except ValueError: pass
Now you can also set the
roc_auc_score to be zero if there is only one class present. However, I wouldn't do this. I guess your test data is highly unbalanced. I would suggest to use stratified K-fold instead so that you at least have both classes present.