Sklearn pass fit() parameters to xgboost in pipeline

For the early stopping rounds, you must always specify the validation set given by the argument eval_set. Here is how the error in your code can be fixed.

pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20, clf__eval_set=[(test_X, test_y)])

This is the solution: https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13755/xgboost-early-stopping-and-other-issues both early_stooping_rounds and the watchlist / eval_set need to be passed. Unfortunately, this does not work for me, as the variables on the watchlist would require a preprocessing step which is only applied in the pipeline / I would need to apply this step manually.