spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)

Initially I downloaded two en packages using following statements in anaconda prompt.

python -m spacy download en_core_web_lg
python -m spacy download en_core_web_sm

But, I kept on getting linkage error and finally running below command helped me to establish link and solved error.

python -m spacy download en

The answer to your misunderstanding is a Unix concept, softlinks which we could say that in Windows are similar to shortcuts. Let's explain this.

When you spacy download en, spaCy tries to find the best small model that matches your spaCy distribution. The small model that I am talking about defaults to en_core_web_sm which can be found in different variations which correspond to the different spaCy versions (for example spacy, spacy-nightly have en_core_web_sm of different sizes).

When spaCy finds the best model for you, it downloads it and then links the name en to the package it downloaded, e.g. en_core_web_sm. That basically means that whenever you refer to en you will be referring to en_core_web_sm. In other words, en after linking is not a "real" package, is just a name for en_core_web_sm.

However, it doesn't work the other way. You can't refer directly to en_core_web_sm because your system doesn't know you have it installed. When you did spacy download en you basically did a pip install. So pip knows that you have a package named en installed for your python distribution, but knows nothing about the package en_core_web_sm. This package is just replacing package en when you import it, which means that package en is just a softlink to en_core_web_sm.

Of course, you can directly download en_core_web_sm, using the command: python -m spacy download en_core_web_sm, or you can even link the name en to other models as well. For example, you could do python -m spacy download en_core_web_lg and then python -m spacy link en_core_web_lg en. That would make en a name for en_core_web_lg, which is a large spaCy model for the English language.

Hope it is clear now :)


The below worked for me :

import en_core_web_sm

nlp = en_core_web_sm.load()