Named Entity Recognition with Spacy and Large Language Model

2 min readJan 31, 2024

With Azure OpenAI

Spacy is the Go-to NER library. With integration of Large Language Models (LLMs) into spaCy pipelines, it supports fast prototyping and prompting of turning unstructured into structured output, without training.

Environment variables

export OPENAI_API_KEY=<api key>
export AZURE_OPENAI_KEY=<api key>

llm_ner.cfg

[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["PERSON", "ORGANISATION", "LOCATION"] 

[components.llm.model]
@llm_models = "spacy.Azure.v1"
name = 
deployment_name = 
base_url = 
model_type = "chat"
api_version = 
config = {"temperature": 0.0 }

Test

from spacy_llm.util import assemble

nlp = assemble("llm_ner.cfg")
content = "Jack and Jill rode up the hill in Les Deux Alpes"
doc = nlp(content)
print([(ent.text, ent.label_) for ent in doc.ents])

Result

[('Jack', 'PERSON'), ('Jill', 'PERSON'), ('Les Deux Alpes', 'LOCATION')]

Langchain integration

For models not directly supported, it is suggested to use langchain support. However, as of 2024.1.30, space-llm only supports langchain.llms (e.g. only openai completion), not chat model (not openai.chat.completion).

langchain_community.llms.__all__
['AI21', 'AlephAlpha', 'AmazonAPIGateway', 'Anthropic', 'Anyscale', 'Aphrodite', 'Arcee', 'Aviary', 'AzureMLOnlineEndpoint', 'AzureOpenAI', 'Banana', 'Baseten', 'Beam', 'Bedrock', 'CTransformers', 'CTranslate2', 'CerebriumAI', 'ChatGLM', 'Clarifai', 'Cohere', 'Databricks', 'DeepInfra', 'DeepSparse', 'EdenAI', 'FakeListLLM', 'Fireworks', 'ForefrontAI', 'GigaChat', 'GPT4All', 'GooglePalm', 'GooseAI', 'GradientLLM', 'HuggingFaceEndpoint', 'HuggingFaceHub', 'HuggingFacePipeline', 'HuggingFaceTextGenInference', 'HumanInputLLM', 'KoboldApiLLM', 'LlamaCpp', 'TextGen', 'ManifestWrapper', 'Minimax', 'MlflowAIGateway', 'Modal', 'MosaicML', 'Nebula', 'NIBittensorLLM', 'NLPCloud', 'OCIModelDeploymentTGI', 'OCIModelDeploymentVLLM', 'Ollama', 'OpenAI', 'OpenAIChat', 'OpenLLM', 'OpenLM', 'PaiEasEndpoint', 'Petals', 'PipelineAI', 'Predibase', 'PredictionGuard', 'PromptLayerOpenAI', 'PromptLayerOpenAIChat', 'OpaquePrompts', 'RWKV', 'Replicate', 'SagemakerEndpoint', 'SelfHostedHuggingFaceLLM', 'SelfHostedPipeline', 'StochasticAI', 'TitanTakeoff', 'TitanTakeoffPro', 'Tongyi', 'VertexAI', 'VertexAIModelGarden', 'VLLM', 'VLLMOpenAI', 'WatsonxLLM', 'Writer', 'OctoAIEndpoint', 'Xinference', 'JavelinAIGateway', 'QianfanLLMEndpoint', 'YandexGPT', 'VolcEngineMaasLLM']

Appendix

Proper example of the Azure API usage · explosion/spacy-llm · Discussion #350

Hi spacy-llm community, I tried recently to use the GPT model deployed in Azure OpenAI from the spacy-llm to do, e.g…

github.com

Large Language Models · spaCy Usage Documentation

Integrating LLMs into structured NLP pipelines

spacy.io

LLM for NER

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and…

In the domain of Natural Language Processing (NLP), Named Entity Recognition (NER) stands out as a pivotal mechanism…

arxiv.org

PromptNER: Prompting For Named Entity Recognition

In a surprising turn, Large Language Models (LLMs) together with a growing arsenal of prompt-based heuristics now offer…

arxiv.org

GPT-NER: Named Entity Recognition via Large Language Models

Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its…

arxiv.org

Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models

Exploring the application of powerful large language models (LLMs) on the fundamental named entity recognition (NER)…

arxiv.org

Named Entity Recognition with Spacy and Large Language Model

Appendix

Proper example of the Azure API usage · explosion/spacy-llm · Discussion #350

Hi spacy-llm community, I tried recently to use the GPT model deployed in Azure OpenAI from the spacy-llm to do, e.g…

Large Language Models · spaCy Usage Documentation

Integrating LLMs into structured NLP pipelines

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and…

In the domain of Natural Language Processing (NLP), Named Entity Recognition (NER) stands out as a pivotal mechanism…

PromptNER: Prompting For Named Entity Recognition

In a surprising turn, Large Language Models (LLMs) together with a growing arsenal of prompt-based heuristics now offer…

GPT-NER: Named Entity Recognition via Large Language Models

Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its…

Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models

Exploring the application of powerful large language models (LLMs) on the fundamental named entity recognition (NER)…

Written by Xin Cheng