Hello World to Conversational Chatbot with Memory

With Langchain and Azure OpenAI

Xin Cheng
12 min readMar 16, 2024

In my previous post, I explored a “question answering“ solution. There is one issue: it does not have memory. Every time you need to provide the context separately, e.g. if you already mentioned “Uber” in one-turn, next you want to just use “the company” to refer to “Uber”, it might get confused about which company you are referring to. In this post, we will see how to add memory to have conversational capability.

You can use the same .env file as in previous post to use Azure OpenAI.

No-memory test

By default, LLM does not remember what you have said. At 4th turn, it will just say “I don’t know”

Create LLM

from langchain_openai import AzureChatOpenAI
import os
import openai

from dotenv import load_dotenv

load_dotenv()

azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_version = os.getenv("AZURE_OPENAI_API_VERSION")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_key = api_key

from langchain.chains import LLMChain
prompt_template = "Answer question from human: {question}"
prompt = PromptTemplate(
input_variables=["question"], template=prompt_template
)
llm_nobuff = LLMChain(
llm=llm,
prompt=prompt
)
# 1st turn
llm_nobuff("Good morning AI!")

Result

{'question': 'Good morning AI!',
'text': 'Good morning! How can I assist you today?'}

2nd turn

llm_nobuff("My interest here is to explore the potential of integrating Large Language Models with external knowledge")

Result

{'question': 'My interest here is to explore the potential of integrating Large Language Models with external knowledge',
'text': "Integrating Large Language Models (LLMs) with external knowledge can have several potential benefits. Here are a few ways in which this integration can be explored:\n\n1. Knowledge augmentation: LLMs like GPT-3 have vast amounts of pre-existing knowledge, but they may not always have the most up-to-date or domain-specific information. By integrating external knowledge sources such as databases, knowledge graphs, or structured data, LLMs can access the latest and more specialized information, enhancing their understanding and response generation capabilities.\n\n2. Fact-checking and verification: LLMs can be used to verify the accuracy of information by cross-referencing external knowledge sources. This can help identify and rectify any potential misinformation generated by the models.\n\n3. Contextual understanding: LLMs excel at understanding and generating human-like text. By integrating external knowledge, they can better understand the context of a conversation or query by leveraging domain-specific information. This can lead to more accurate and contextually appropriate responses.\n\n4. Question-answering systems: LLMs can be integrated with external knowledge bases to create powerful question-answering systems. By combining the language generation capabilities of LLMs with the structured information in external knowledge bases, we can build systems that can provide detailed and accurate answers to a wide range of questions.\n\n5. Personalized recommendations: LLMs can be enhanced with external knowledge about users' preferences, behaviors, or past interactions to provide personalized recommendations. By integrating external data sources such as user profiles, purchase history, or browsing behavior, LLMs can generate more tailored and relevant recommendations.\n\n6. Domain-specific applications: LLMs can be fine-tuned using domain-specific data and integrated with relevant knowledge sources to create specialized applications. For example, in healthcare, LLMs can be integrated with medical literature databases to assist in diagnosing diseases or suggesting treatment options.\n\nIt's worth noting that integrating LLMs with external knowledge also poses challenges, such as dealing with conflicting information, maintaining data quality, and ensuring ethical usage. However, with careful consideration and appropriate safeguards, the integration of LLMs with external knowledge has the potential to significantly enhance their capabilities and enable a wide range of practical applications."}

3rd turn

llm_nobuff("I just want to analyze the different possibilities. What can you think of?")

Result

{'question': 'My interest here is to explore the potential of integrating Large Language Models with external knowledge',
'text': "Integrating Large Language Models (LLMs) with external knowledge can have several potential benefits. Here are a few ways in which this integration can be explored:\n\n1. Knowledge augmentation: LLMs like GPT-3 have vast amounts of pre-existing knowledge, but they may not always have the most up-to-date or domain-specific information. By integrating external knowledge sources such as databases, knowledge graphs, or structured data, LLMs can access the latest and more specialized information, enhancing their understanding and response generation capabilities.\n\n2. Fact-checking and verification: LLMs can be used to verify the accuracy of information by cross-referencing external knowledge sources. This can help identify and rectify any potential misinformation generated by the models.\n\n3. Contextual understanding: LLMs excel at understanding and generating human-like text. By integrating external knowledge, they can better understand the context of a conversation or query by leveraging domain-specific information. This can lead to more accurate and contextually appropriate responses.\n\n4. Question-answering systems: LLMs can be integrated with external knowledge bases to create powerful question-answering systems. By combining the language generation capabilities of LLMs with the structured information in external knowledge bases, we can build systems that can provide detailed and accurate answers to a wide range of questions.\n\n5. Personalized recommendations: LLMs can be enhanced with external knowledge about users' preferences, behaviors, or past interactions to provide personalized recommendations. By integrating external data sources such as user profiles, purchase history, or browsing behavior, LLMs can generate more tailored and relevant recommendations.\n\n6. Domain-specific applications: LLMs can be fine-tuned using domain-specific data and integrated with relevant knowledge sources to create specialized applications. For example, in healthcare, LLMs can be integrated with medical literature databases to assist in diagnosing diseases or suggesting treatment options.\n\nIt's worth noting that integrating LLMs with external knowledge also poses challenges, such as dealing with conflicting information, maintaining data quality, and ensuring ethical usage. However, with careful consideration and appropriate safeguards, the integration of LLMs with external knowledge has the potential to significantly enhance their capabilities and enable a wide range of practical applications."}

4th turn

llm_nobuff("What is my aim again?")

Result

{'question': 'What is my aim again?',
'text': "I'm sorry, but as an AI language model, I don't have access to personal information or past conversations. Therefore, I don't know what your specific aim or goal might be. Could you please provide more context or clarify your question? I'm here to help with any information or assistance you need."}

Add memory

The most basic langchain conversational capability is enabled by ConversationChain and ConversationBufferMemory classes. Let’s follow this example and you will see at 4th turn, it outputs the previous aim I mentioned.

Create conversational chain with buffer memory

from langchain.chains import ConversationChain

# now initialize the conversation chain
conversation = ConversationChain(llm=llm)

from langchain.chains.conversation.memory import ConversationBufferMemory

conversation_buf = ConversationChain(
llm=llm,
memory=ConversationBufferMemory()
)

Internals

The conversational chain has a prompt with 2 input variables, 1 is {history} of conversation and the other is current input

print(conversation.prompt.template)

Result

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:

Helper function

from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
with get_openai_callback() as cb:
result = chain.run(query)
print(f'Spent a total of {cb.total_tokens} tokens')

return result

2nd turn

count_tokens(
conversation_buf,
"My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

Result

Spent a total of 322 tokens
Out[13]:
"That's an interesting topic! Integrating large language models with external knowledge can have a lot of potential benefits. By incorporating external knowledge sources, such as databases, encyclopedias, or even specific domain expertise, language models can become more robust and accurate in generating responses or providing information.\n\nOne approach to integrating external knowledge is through knowledge graphs. Knowledge graphs organize information in a structured way, representing entities and their relationships. By connecting a language model to a knowledge graph, the model can access relevant information during the conversation and incorporate it into its responses.\n\nAnother approach is to use pre-trained language models and fine-tune them on specific domains or tasks. This allows the model to learn from both the large-scale pre-training data and the domain-specific data, enabling it to provide more accurate and contextually relevant responses.\n\nThere are also ongoing research efforts in building hybrid models that combine the strengths of large language models with external knowledge sources. These models aim to leverage the advantages of both approaches to improve the performance and versatility of AI systems.\n\nWhat specific aspects of integrating large language models with external knowledge are you interested in exploring further?"

3rd turn

count_tokens(
conversation_buf,
"I just want to analyze the different possibilities. What can you think of?"
)

Result

Spent a total of 695 tokens
Out[14]:
"Sure! Here are a few possibilities to consider when integrating large language models with external knowledge:\n\n1. Knowledge-based querying: You can use a language model to query external knowledge sources, such as databases or APIs, to retrieve specific information. The model can generate natural language queries and process the retrieved information to provide accurate and contextually relevant responses.\n\n2. Knowledge graph augmentation: Language models can help in automatically populating or expanding knowledge graphs. By analyzing textual data and extracting relevant information, the models can contribute to the enrichment of existing knowledge graphs or even help create new ones.\n\n3. Contextual understanding: Large language models can benefit from external knowledge by incorporating it into their contextual understanding. By accessing external knowledge during a conversation, the model can generate more informed and accurate responses based on the specific context or domain.\n\n4. Fact-checking and verification: Language models can be used to fact-check information by cross-referencing it with external knowledge sources. This can help in identifying and correcting inaccuracies or false information.\n\n5. Domain-specific fine-tuning: By fine-tuning a large language model on specific domains or tasks, you can enhance its performance and adapt it to specific contexts. This can be particularly useful in fields like medicine, law, or finance, where domain expertise is crucial.\n\n6. Multi-modal integration: Integrating external knowledge with large language models can also involve incorporating other modalities, such as images or videos. This allows the model to generate more comprehensive and contextually relevant responses by considering multiple sources of information.\n\nThese are just a few possibilities, and there may be many more depending on the specific use case or requirements. It's important to carefully analyze the available knowledge sources, the goals of integration, and the potential benefits and challenges associated with each approach."

4th turn

count_tokens(
conversation_buf,
"What is my aim again?"
)

Result

Spent a total of 741 tokens
Out[15]:
'Your aim is to explore the potential of integrating Large Language Models with external knowledge. You are interested in analyzing the different possibilities and understanding how this integration can be beneficial in various contexts.'

Inspect memory

print(conversation_buf.memory.buffer)

Result

Human: Good morning AI!
AI: Good morning! How can I assist you today?
Human: My interest here is to explore the potential of integrating Large Language Models with external knowledge
AI: That's an interesting topic! Integrating large language models with external knowledge can have a lot of potential benefits. By incorporating external knowledge sources, such as databases, encyclopedias, or even specific domain expertise, language models can become more robust and accurate in generating responses or providing information.

One approach to integrating external knowledge is through knowledge graphs. Knowledge graphs organize information in a structured way, representing entities and their relationships. By connecting a language model to a knowledge graph, the model can access relevant information during the conversation and incorporate it into its responses.

Another approach is to use pre-trained language models and fine-tune them on specific domains or tasks. This allows the model to learn from both the large-scale pre-training data and the domain-specific data, enabling it to provide more accurate and contextually relevant responses.

There are also ongoing research efforts in building hybrid models that combine the strengths of large language models with external knowledge sources. These models aim to leverage the advantages of both approaches to improve the performance and versatility of AI systems.

What specific aspects of integrating large language models with external knowledge are you interested in exploring further?
Human: I just want to analyze the different possibilities. What can you think of?
AI: Sure! Here are a few possibilities to consider when integrating large language models with external knowledge:

1. Knowledge-based querying: You can use a language model to query external knowledge sources, such as databases or APIs, to retrieve specific information. The model can generate natural language queries and process the retrieved information to provide accurate and contextually relevant responses.

2. Knowledge graph augmentation: Language models can help in automatically populating or expanding knowledge graphs. By analyzing textual data and extracting relevant information, the models can contribute to the enrichment of existing knowledge graphs or even help create new ones.

3. Contextual understanding: Large language models can benefit from external knowledge by incorporating it into their contextual understanding. By accessing external knowledge during a conversation, the model can generate more informed and accurate responses based on the specific context or domain.

4. Fact-checking and verification: Language models can be used to fact-check information by cross-referencing it with external knowledge sources. This can help in identifying and correcting inaccuracies or false information.

5. Domain-specific fine-tuning: By fine-tuning a large language model on specific domains or tasks, you can enhance its performance and adapt it to specific contexts. This can be particularly useful in fields like medicine, law, or finance, where domain expertise is crucial.

6. Multi-modal integration: Integrating external knowledge with large language models can also involve incorporating other modalities, such as images or videos. This allows the model to generate more comprehensive and contextually relevant responses by considering multiple sources of information.

These are just a few possibilities, and there may be many more depending on the specific use case or requirements. It's important to carefully analyze the available knowledge sources, the goals of integration, and the potential benefits and challenges associated with each approach.
Human: What is my aim again?
AI: Your aim is to explore the potential of integrating Large Language Models with external knowledge. You are interested in analyzing the different possibilities and understanding how this integration can be beneficial in various contexts.

The article mentions that with above approach, our tokens are accumulated quickly and easily exceed LLM token limit after a few turns. Therefore, there are some other memory types available:

ConversationSummaryMemory: summarizes the conversation history before it is passed to the {history} parameter.

ConversationBufferWindowMemory: ConversationBufferMemory but only keeps a given number of past interactions before “forgetting”them

ConversationSummaryBufferMemory: mix of the ConversationSummaryMemory and the ConversationBufferWindowMemory

RAG conversational chain

In order to add conversational capability to RAG, we need to Conversational Retrieval Chain as this example.

Create LLM and embedding

from operator import itemgetter

from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

api_version = os.getenv("AZURE_OPENAI_API_VERSION")
llm = AzureChatOpenAI(
deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
openai_api_version=api_version
)
embedding = AzureOpenAIEmbeddings(
azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDINGS_MODEL_NAME"),
openai_api_version=api_version,
)

Create dummy documents and retriever

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=embedding
)
retriever = vectorstore.as_retriever()

Create condense question prompt and answer prompt

condense question is to use chat history and new question to form a standalone question. Answer prompt is typical RAG prompt to use context retrieved to answer question.

from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
from langchain_core.prompts import format_document
from langchain_core.runnables import RunnableParallel

from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

Helper function to combine multiple documents

DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")


def _combine_documents(
docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)

Create conversational rag chain

You can see

  1. _inputs is generated by retrieving chat_history from memory buffer, pass to condense question prompt and LLM to generate a standalone question.
  2. Remaining is RAG process. The standalone question is passed to retriever to retrieve documents, then multiple documents are combined to populate context variable. The standalone question is also used as question later
  3. Context and query are sent to LLM
_inputs = RunnableParallel(
standalone_question=RunnablePassthrough.assign(
chat_history=lambda x: get_buffer_string(x["chat_history"])
)
| CONDENSE_QUESTION_PROMPT
| llm
| StrOutputParser(),
)
_context = {
"context": itemgetter("standalone_question") | retriever | _combine_documents,
"question": lambda x: x["standalone_question"],
}
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | llm

1st turn with simulated chat history

conversational_qa_chain.invoke(
{
"question": "where did harrison work?",
"chat_history": [],
}
)

Result

AIMessage(content='Harrison worked at Kensho.')

2nd turn

conversational_qa_chain.invoke(
{
"question": "where did he work?",
"chat_history": [
HumanMessage(content="Who wrote this notebook?"),
AIMessage(content="Harrison"),
],
}
)

Result

AIMessage(content='Harrison worked at Kensho.')

3rd turn

conversational_qa_chain.invoke(
{
"question": "who is he?",
"chat_history": [
HumanMessage(content="Who wrote this notebook?"),
AIMessage(content="Harrison"),
],
}
)

Result

AIMessage(content='Based on the given context, "he" refers to Harrison.')

Adding real memory

# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = {
"standalone_question": {
"question": itemgetter("question"),
"chat_history": lambda x: get_buffer_string(x["chat_history"]),
}
| CONDENSE_QUESTION_PROMPT
| llm
| StrOutputParser(),
}
# Now we retrieve the documents
retrieved_documents = {
"docs": itemgetter("standalone_question") | retriever,
"question": itemgetter("standalone_question"),
}
# Now we construct the inputs for the final prompt
final_inputs = {
"context": lambda x: _combine_documents(x["docs"]),
"question": itemgetter("question"),
}
# And finally, we do the part that returns the answers
answer = {
"answer": final_inputs | ANSWER_PROMPT | llm,
"docs": itemgetter("docs"),
}
# And now we put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer

1st turn

inputs = {"question": "where did harrison work?"}
result = final_chain.invoke(inputs)
result

Result

{'answer': AIMessage(content='Harrison was employed at Kensho.'),
'docs': [Document(page_content='harrison worked at kensho')]}

Save memory

# Note that the memory does not save automatically
# This will be improved in the future
# For now you need to save it yourself
memory.save_context(inputs, {"answer": result["answer"].content})
memory.load_memory_variables({})

Result

{'history': [HumanMessage(content='where did harrison work?'),
AIMessage(content='Harrison was employed at Kensho.')]}

2nd turn

inputs = {"question": "but where did he really work?"}
result = final_chain.invoke(inputs)
result

Result

{'answer': AIMessage(content='Harrison actually worked at Kensho.'),
'docs': [Document(page_content='harrison worked at kensho')]}

--

--

Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified