AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn

It's because your dataset is in wrong format, you should pass "An iterable which yields either str, unicode or file objects" into CountVectorizer's fit function (Or into pipeline, doesn't matter). Not iterable over other iterables with texts (as in your code). In your case List is iterable, and you should pass flat list whose members are strings (not another lists).

i.e. your dataset should look like:

X_train = ['this is an dummy example',
      'in reality this line is very long',
      ...
      'here is a last text in the training set'
    ]

Look at this example, very useful: Sample pipeline for text feature extraction and evaluation

AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn

Tags:

Python

Pipeline

Scikit Learn

Related

Recent Posts