predict_proba for a cross-validated model

This is now implemented as part of scikit-learn version 0.18. You can pass a 'method' string parameter to the cross_val_predict method. Documentation is here.

Example:

proba = cross_val_predict(logreg, X, y, cv=cv, method='predict_proba')

Also note that this is part of the new sklearn.model_selection package so you will need this import:

from sklearn.model_selection import cross_val_predict

An easy workaround for this is to create a wrapper class, which for your case would be

class proba_logreg(LogisticRegression):
    def predict(self, X):
        return LogisticRegression.predict_proba(self, X)

and then pass an instance of it as the classifier object to cross_val_predict

# cross validation probabilities
probas = cross_val_predict(proba_logreg(), X, y, cv=cv)

There is a function cross_val_predict that gives you the predicted values, but there is no such function for "predict_proba" yet. Maybe we could make that an option.


This is easy to implement:

def my_cross_val_predict(
            m, X, y, cv=KFold(),
            predict=lambda m, x: m.predict_proba(x),
            combine=np.vstack
            ):

        preds = []

        for train, test in cv.split(X):
            m.fit(X[train, :], y[train])
            pred = predict(m, X[test, :])
            preds.append(pred)

        return combine(preds)

This one returns predict_proba. If you need both predict and predict_proba just change predict and combine arguments:

def stack(arrs):
    if arrs[0].ndim == 1:
        return np.hstack(arrs)
    else:
        return np.vstack(arrs)

def my_cross_val_predict(
        m, X, y, cv=KFold(),
        predict=lambda m, x:[ m.predict(x)
                            , m.predict_proba(x)
                            ],
        combine=lambda preds: list(map(stack, zip(*preds)))
        ):
    preds = []
    for train, test in cv.split(X):
        m.fit(X[train, :], y[train])
        pred = predict(m, X[test, :])
        preds.append(pred)

    return combine(preds)