Train Model fails because 'list' object has no attribute 'lower'

Apply

X = df.text.astype(str)

I had the similar problem but instead of extracting values using .loc[] or .iloc[], I simply used

X = df.text
y = df.target

which converts the dataframe column to Series having list as each row and tokenized items as objects in each row. The series looked similar to what Alex had:

print(X)

Directly converted series

So, only .astype(str) worked for me.

Result:

Series after applying .astype(str)


The TFIDF Vectorizer should expect an array of strings. So if you pass him an array of arrays of tokenz, it crashes.


add this code .apply(lambda x: ' '.join(x)) after X_train and y_train and it should work.


Answer from http://www.davidsbatista.net/blog/2018/02/28/TfidfVectorizer/

from sklearn.feature_extraction.text import CountVectorizer

def dummy(doc):
    return doc

tfidf = CountVectorizer(
    tokenizer=dummy,
    preprocessor=dummy,
)  

docs = [
    ['hello', 'world', '.'],
    ['hello', 'world'],
    ['again', 'hello', 'world']
]

tfidf.fit(docs)
tfidf.get_feature_names()
# ['.', 'again', 'hello', 'world']