Named entity recognition in Spacy

In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.

Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on

Install the dependencies:

sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2

Download a model:

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

Here's a list of available models.

And then use it as you would normally do:

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

References:

Learn about Transformers and Attention.

Read a summary about the different Trasnformers architectures.

Learn about the Transformers fine-tune done by Spacy.


As per spacy documentation for Name Entity Recognition here is the way to extract name entity

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Result
Name Entity: (China,)

To make "Alphabet" a 'Noun' append it with "The".

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)