sklearn classifier get ValueError: bad input shape

I had the same issue.

So if you are facing the same problem you should check the shape of clf.fit(X,y)parameters:

X : Training vector {array-like, sparse matrix}, shape (n_samples, n_features).

y : Target vector relative to X array-like, shape (n_samples,).

as you can see the y width should be 1, to make sure your target vector is shaped correctly try command

y.shape

should be (n_samples,)

In my case, for my training vector I was concatenating 3 separate vectors from 3 different vectorizers to use all as my final training vector. The problem was that each vector had the ['Label'] column in it so the final training vector contained 3 ['Label'] columns. Then when I used final_trainingVect['Label'] as my Target vector it's shape was n_samples,3).


Thanks to @meelo, I solved this problem. As he said: in my code, data is a feature vector, target is target value. I mixed up two things.

I learned that TfidfVectorizer processes data to [data, feature], and each data should map to just one target.

If I want to predict two type targets, I need two distinct targets:

  1. target_C1 with all C1 value
  2. target_C2 with all C2 value.

Then use the two targets and original data to train two classifier for each target.