In spacy, how to use your own word2vec model created in gensim?

As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz

then you can load you model, nlp = spacy.load('your_model') and use it!

Also see the similar question that answered here.

Train and save your model in plain-text format:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec

path = get_tmpfile("./data/word2vec.model")

model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")

Gzip the text file:

gzip word2vec.txt

Which produces a word2vec.txt.gz file.

Run the following command:

python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz

Load the vectors using:

nlp = spacy.load('./data/spacy.word2vec.model/')

All of these answers are for an older version of spacy. In the latest version the command is changed to:

python -m spacy init vectors [OPTIONS] LANG VECTORS_LOC OUTPUT_DIR

you can learn more about options by typing python -m spacy init --help in your command prompt

In spacy, how to use your own word2vec model created in gensim?

Tags:

Model

Gensim

Word2Vec

Spacy

Related

Recent Posts