Implement K-fold cross validation in MLPClassification Python

Kudos to @COLDSPEED's answer.

If you'd like to have the prediction of n fold cross-validation, cross_val_predict() is the way to go.

# Scamble and subset data frame into train + validation(80%) and test(10%)
df = df.sample(frac=1).reset_index(drop=True)
train_index = 0.8
df_train = df[ : len(df) * train_index]

# convert dataframe to ndarray, since kf.split returns nparray as index
feature = df_train.iloc[:, 0: -1].values
target = df_train.iloc[:, -1].values

solver = MLPClassifier(activation='relu', solver='adam', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1, verbose=True)
y_pred = cross_val_predict(solver, feature, target, cv = 10)

Basically, the option cv indicates how many cross-validation you'd like to do in the training. y_pred is the same size as target.


Do not split your data into train and test. This is automatically handled by the KFold cross-validation.

from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)

for train_indices, test_indices in kf.split(X):
    clf.fit(X[train_indices], y[train_indices])
    print(clf.score(X[test_indices], y[test_indices]))

KFold validation partitions your dataset into n equal, fair portions. Each portion is then split into test and train. With this, you get a fairly accurate measure of the accuracy of your model since it is tested on small portions of fairly distributed data.