Error "Expected 2D array, got 1D array instead" Using OneHotEncoder

At the moment that will change the categorical features, you need to add another pair of brackets:

X[:, 0] = pd.DataFrame(onehotencoder1.fit_transform(X[[:, 0]]).toarray())

This is an issue in sklearn OneHotEncoder raised in https://github.com/scikit-learn/scikit-learn/issues/3662. Most scikit learn estimators need a 2D array rather than a 1D array.

The standard practice is to include a multidimensional array. Since you have specified which column to consider as categorical for onehotencoding in categorical_features = [0], you can rewrite the next line as the following to take whole dataset or a part of it. It will consider only the first column for categorical to dummy transformation while still have a multidimensional array to work with.

onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()

(I hope your dataset doesn't have anymore categorical values. I'll advise you to labelencode everything first, then onehotencode.


I got the same error and after the error message there's a suggestion as followed:

"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."

Since my data was an array, i used X.values.reshape(-1,1) and it works. (There was another suggestion to use X.values.reshape instead of X.reshape).


I came across a fix by adding

X=X.reshape(-1,1)

the error appears to be gone now, but not sure if this is the right way to fix this