Retrieval-augmented Generation with CrewAI

LLM Agent with Azure OpenAI

Xin Cheng
6 min readMar 27, 2024

In previous article, we talked about CrewAI, LLM-powered agent programming framework, using bing search to search public websites. This article covers one of the most common use cases: Retrieval-augmented Generation (will have dedicated articles for this topic). This enables us to answer questions using private data.

https://docs.crewai.com/tools/WebsiteSearchTool/

All RAG tools at the moment can only use openAI to generate embeddings, we are working on adding support for other providers.

We want to use Azure OpenAI embedding. Until it is natively supported, we have to define our custom tool.

Steps

  1. Define custom tool
  2. Define agent
  3. Define task
from langchain_community.vectorstores import FAISS
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

import os
from dotenv import load_dotenv
load_dotenv()
import openai

api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_key = api_key
os.environ['OPENAI_API_KEY'] = os.getenv("AZURE_OPENAI_API_KEY")
api_version = os.getenv("AZURE_OPENAI_API_VERSION")
llm = AzureChatOpenAI(
deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
openai_api_version=api_version
)
embedding = AzureOpenAIEmbeddings(
azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDINGS_MODEL_NAME"),
openai_api_version=api_version,
)

from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=embedding
)
retriever = vectorstore.as_retriever()

Define custom tool

The main stuff is name, description and _run method

from crewai_tools import BaseTool

class WorkInfoSearchTool(BaseTool):
name: str = "Work Info Search Tool"
description: str = "Search work related information."

def _run(self, query: str) -> str:
# Implementation goes here
return retriever.get_relevant_documents(query)

Define agent

from crewai import Agent, Task, Crew, Process


tools=[WorkInfoSearchTool()]
websearch_agent = Agent(
role='Work information search',
goal='Find answer for work information. Reply "No information" if you cannot find information',
backstory="""Find answer for work information relevant to the user. """,
llm=llm,
verbose=True,
allow_delegation=True,
tools=tools
)

Define task

task1 = Task(
description="""where did harrison work?""",
expected_output="""work information""",
agent=websearch_agent
)

Run agent

# use openai embedding, how to use Azure OpenAI embedding in CrewAI way?
crew = Crew(
agents=[websearch_agent],
tasks=[task1],
verbose=2,
process=Process.sequential
)

# Kick off the crew to start on it's tasks
result = crew.kickoff()

print("######################")
print(result)

Result

 [DEBUG]: == Working Agent: Work information search
[INFO]: == Starting Task: where did harrison work?


> Entering new CrewAgentExecutor chain...
I need to find information about where Harrison worked.

Action: Work Info Search Tool
Action Input: {'query': 'Harrison work history'}

[Document(page_content='harrison worked at kensho')]

Thought:
I found a document that mentions Harrison worked at Kensho.

Final Answer:
Harrison worked at Kensho.

> Finished chain.
[DEBUG]: == [Work information search] Task output: Harrison worked at Kensho.


######################
Harrison worked at Kensho.

Negative sample

task1 = Task(
description="""where did greg work?""",
expected_output="""work information""",
agent=websearch_agent
)

Result

 [DEBUG]: == Working Agent: Work information search
[INFO]: == Starting Task: where did greg work?


> Entering new CrewAgentExecutor chain...
I should use the Work Info Search Tool to find information about where Greg worked.

Action:
Work Info Search Tool

Action Input:
{
"query": "where did Greg work"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search result shows that Harrison worked at Kensho, not Greg. I need to search again.

Action:
Work Info Search Tool

Action Input:
{
"query": "where did Greg work"
}

I tried reusing the same input, I must stop using this action input. I'll try something else instead.



Thought:
I should try a different query to find information about where Greg worked.

Action:
Work Info Search Tool

Action Input:
{
"query": "Greg's work history"
}

[Document(page_content='harrison worked at kensho')]

Thought:
I should try a different query to find information about where Greg worked.

Action:
Work Info Search Tool

Action Input:
{
"query": "Greg work history"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search result is still showing information about Harrison working at Kensho. This is not helpful for finding where Greg worked. I need to try another query.

Action:
Work Info Search Tool

Action Input:
{
"query": "Greg's employment history"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search result is still not providing any information about where Greg worked. I need to try a different approach.

Action:
Work Info Search Tool

Action Input:
{
"query": "Greg previous jobs"
}

[Document(page_content='harrison worked at kensho')]

Thought:
I have exhausted all the queries I can think of to find information about where Greg worked. I couldn't find any relevant information.

Final Answer: No information

> Finished chain.
[DEBUG]: == [Work information search] Task output: No information


######################
No information

Another negative sample

task1 = Task(
description="""where is harrison's hobby?""",
expected_output="""hobby information""",
agent=websearch_agent
)

Result

[DEBUG]: == Working Agent: Work information search
[INFO]: == Starting Task: where is harrison's hobby?


> Entering new CrewAgentExecutor chain...
I need to find information about Harrison's hobby. To do that, I will use the Work Info Search Tool.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby"
}

[Document(page_content='harrison worked at kensho')]

Thought: The search results only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to keep searching.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby Kensho"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search results still only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to try a different approach.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby personal"
}

[Document(page_content='harrison worked at kensho')]

Thought: I need to find information about Harrison's hobby. To do that, I will use the Work Info Search Tool.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby"
}

[Document(page_content='harrison worked at kensho')]

Thought: The search results only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to keep searching.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby Kensho"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search results still only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to try a different approach.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby personal"
}

I tried reusing the same input, I must stop using this action input. I'll try something else instead.



Thought: I need to find information about Harrison's hobby. To do that, I will use the Work Info Search Tool.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby"
}

[Document(page_content='harrison worked at kensho')]

Thought: The search results only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to keep searching.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby Kensho"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search results still only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to try a different approach.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby personal life"
}

[Document(page_content='harrison worked at kensho')]

Thought: I need to find information about Harrison's hobby. To do that, I will use the Work Info Search Tool.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby"
}

[Document(page_content='harrison worked at kensho')]

Thought: The search results only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to keep searching.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby Kensho"
}

[Document(page_content='harrison worked at kensho')]

Thought:
The search results still only mention that Harrison worked at Kensho, but there is no information about his hobby. I need to try a different approach.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby personal life"
}

I tried reusing the same input, I must stop using this action input. I'll try something else instead.



Thought: I need to find information about Harrison's hobby. To do that, I will use the Work Info Search Tool.

Action: Work Info Search Tool
Action Input: {
"query": "Harrison's hobby"
}

[Document(page_content='harrison worked at kensho')]

Final Answer: There is no information available about Harrison's hobby.

> Finished chain.
[DEBUG]: == [Work information search] Task output: There is no information available about Harrison's hobby.


######################
There is no information available about Harrison's hobby.

Conclusion

You can program RAG easily with langchain, llamaindex also. However, CrewAI implementation adds other layer to ensure the quality (e.g. with native RAG, the retrieve may return Harrison’s work when asked about “Greg’s work”. You may have to set matching threshold, which is a bit manual). Using CrewAI, you can easily setup answer, validator agent.

Appendix

The article describes to keep identical environment in LLM world. You need

You can set the dependency versions.
You can use Dockerization.
You can set the LLM temperature to 0.
You can set whatever seed you want.
Same exact GPU model.

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet