DSPy

Auto-prompt-engineering with LLM

7 min readJun 26, 2024

When working with LLMs, a prompt engineer usually has love-and-hate relationship with prompt. A perfect prompt can generate what you want. However, perfect prompt usually does not exist with complex tasks. You need to iterate and constantly refine your prompts, and that is still fragile. As a developer, you want to automate things, not write natural languages. If you can find good samples, you can try DSPy to write prompt for you (at least initially). Below are articles to get you started.

A gentle introduction to DSPy

This blog post provides a comprehensive introduction to DSPy, focusing on its use in handling large language models…

learnbybuilding.ai

Name origin: “Demonstrate-Search-Predict” (originally). Purpose: build robust applications that leverage the power of LLMs models without getting bogged down in the complexities of prompt engineering and model fine-tuning.

LLM Auto-Prompt & Chaining

Using DSPy with GPT 3.5 on Azure

paul-bruffett.medium.com

Prompt Wrappers: very thin wrapper for prompt templating.
Application Development Libraries: LangChain, LlamaIndex
Generation Control Libraries: Guidance, LMQL, RELM, Outlines
Prompt Generation & Automation: DSPy

Building an AI Assistant with DSPy

A way to program and tune prompt-agnostic LLM agent pipelines

towardsdatascience.com

The article uses zero-shot as the simplest DSPy sample.

class ZeroShot(dspy.Module):
    """
    Provide answer to question
    """
    def __init__(self):
        super().__init__()
        self.prog = dspy.Predict("question -> answer")

    def forward(self, question):
        return self.prog(question="In the game of bridge, " + question)

Write a subclass of dspy.Module; init method, set up a LM module with single call to dspy.Predict: one input (question) and one output (answer) as the signature; forward() method runs inference on passed-in question.

Setup global LLM as Google Gemini

gemini = dspy.Google("models/gemini-1.0-pro",
                         api_key=api_key,
                         temperature=temperature)
dspy.settings.configure(lm=gemini, max_tokens=1024)

Inference

module = ZeroShot()
response = module("What is Stayman?")
print(response)

RAG (retrievers support)

from chromadb.utils import embedding_functions
default_ef = embedding_functions.DefaultEmbeddingFunction()
bidding_rag = ChromadbRM(CHROMA_COLLECTION_NAME, CHROMADB_DIR, default_ef, k=3)

Multi-module is just specify order of using modules in forward method. (is there way to define nonlinear orchestration?)

Then article shows how you can use examples (question/answer pair) to let DSPy to tune prompt like below.

Intro to DSPy: Simple Ideas To Improve Your RAG

Language models (LMs) like GPT-4 have transformed how we interact with machine learning systems, tackling tasks from…

pub.towardsai.net

DSPy is a framework developed by Stanford University that can automatically optimize LLM prompts and weights.

Modules: support ReAct, ChainofThought, ProgramofThought, etc.

like below, we define RAG module with retriever and generator (all components are defined in __init__ function and the way to run inference is defined in forward function)

class RagModule(dspy.Module): 
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Optimizers: The optimizer is a component that automatically evaluates generated responses and searched context and optimizes prompts and weights accordingly, like BootstrapFewShot, BootstrapFewShotWithRandomSearch, BayesianSignatureOptimizer, depending on how many task examples you have

like below we define evaluation metric and BootstrapFewShot optimizer, then compile the module

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Signature: Signatures are components for defining the structure of inputs and outputs in RAG applications.

class GenerateAnswer(dspy.Signature):
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

vs. LangChain or LlamaIndex: DSPy shifts the construction of LM-based pipelines from operating prompts to being closer to programming.

Pipeline execution

my_question = "What castle did David Gregory inherit?"
pred = compiled_rag(my_question)

print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")dfdfdf

Pipeline evaluation

from dspy.evaluate.evaluate import Evaluate

# `evaluate_on_hotpotqa` 
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=False, display_table=5)


metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)

Retriever evaluation

def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

Disadvantages are:

Supports English only. A feature of DSPy is that you don’t have to write prompts, but the instructions behind the scenes are in English. Sentiment analysis was possible in Spanish or Chinese, but it’s unclear whether it can handle other complex tasks.
Does not support complex tasks. Normally, when using GPT, you need to write a more detailed prompt, but in DSPy, you cannot edit the prompt. So if you don’t have good input/output examples, you cannot use DSPy since it cannot guess what you want.

DSPy for beginners: Auto Prompt Engineering using Programming

LangChain’s alternate for building Generative AI apps

medium.com

Prompt engineering automation

Bootstrapping: Starting with an initial seed prompt, DSPy iteratively refines it based on the LM’s outputs and user-provided examples/assertions
Prompt Chaining: Breaking down complex tasks into a sequence of simpler sub-prompts
Prompt Ensembling: Combining multiple prompt variations to improve performance

It is suitable for applications that you can easily gather sample inputs and outputs.

An Introduction To DSPy

Declarative Self-Improving Language Programs (DSPy) aims to separate the program flow from the prompts. While also…

cobusgreyling.medium.com

Suitable for cases to share a very small amount of data, and have DSPy generate the initial prompts, prompt templates and prompting strategies.

Demystifying DSPy: The Framework Redefining Your LLM Workflow

DSPy is a Python library that has seen an exponential uptick in interest lately. This article is my informed take on…

bassimeledath.substack.com

Auto-prompt (like analogy of providing data and let machine learning to write program for you)

A few more details on optimizers

Boostrap fewshot optimizer: Uses a teacher LM to select the best demonstrations to include for the prompt from a larger set of demonstrations provided by the user.

COPRO optimizer: Finds the best-performing instruction for the model. Starts with a set of initial instructions, generates variations of those instructions, evaluates each variation and finally returns the best performing instruction.

MIPRO optimizer: Finds the best-performing combination of instruction and demonstrations. Working similarly to the COPRO optimizer, it returns the best-performing combination of instructions and examples.

Gradient Blog: Achieving GPT-4 Level Performance At Lower Cost Using DSPy

Achieving GPT-4 Level Performance At Lower Cost Using DSPy Erin Mccarthy, ML Engineer at Gradient

gradient.ai

DSPy without CoT +12% improvement over manual prompt and cost a total of less than $0.50.

Prompt Like a Data Scientist: Auto Prompt Optimization and Testing with DSPy

Applying machine learning methodology to prompt building

towardsdatascience.com

Prompt engineering challenges

Manual prompt engineering which does not generalize well: LLMs are highly sensitive to how they are prompted for each task
Lack framework to conduct testing

Inspect generated prompt

# lm is dspy.OpenAI, dspy.Databricks, etc.
lm.inspect_history(n=1)

Integration with other products

Using DSPy with Qdrant to Build Advanced RAG Pipelines

The no-prompt technique to building a RAG

blog.stackademic.com

Set retriever model with Qdrant vector search engine

from dspy.retrieve.qdrant_rm import QdrantRM
qdrant_retriever_model = QdrantRM("customer_service", client, k=10)
dspy.settings.configure(lm=llm, rm=qdrant_retriever_model)

Prompt Like a Pro Using DSPy: A guide to build a better local RAG model using DSPy, Qdrant and…

Learn to build an end-to-end RAG pipeline and run it completely locally on your laptop using Chain of Thought, DSPy…

medium.com

Support Ollama through ‘dspy.OllamaLocal’

Automating Prompt Engineering with DSPy and Haystack

Teach your LLM how to talk through examples

towardsdatascience.com

Shows a way to define custom retriever in DSPy module.

Define haystack retriever

from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack import Pipeline


retriever = InMemoryBM25Retriever(document_store, top_k=3)

Use retrieve method to use haystack retriever (the effect is actually defining self.retrieve = in __init__ method, but since there are more lines, so packaged in a method)

class RAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    # this makes it possible to use the Haystack retriever
    def retrieve(self, question):
        results = retriever.run(query=question)
        passages = [res.content for res in results['documents']]
        return Prediction(passages=passages)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Integration with LlamaIndex

DSPy RAG with LlamaIndex — Programming LLMs over Prompting

Fellow Developer: Hi Phil, when exactly should I use DSPy over LlamaIndex or LangChain when building LM apps over my…

medium.com

Quite loose integration. Use LlamaIndex VectorStoreIndex (which is abstraction over vector databases) as the retriever in DSPy module.

Integration with Langchain

DSPy | 🦜️🔗 LangChain

DSPy is a fantastic framework for LLMs that introduces an automatic compiler that teaches LMs how to conduct the…

python.langchain.com

dspy/examples/tweets/compiling_langchain.ipynb at main · stanfordnlp/dspy

DSPy: The framework for programming-not prompting-foundation models - dspy/examples/tweets/compiling_langchain.ipynb at…

github.com

Use DSPy as retriever in langchain
Use langchain chain into DSPy module

# From DSPy, import the modules that know how to interact with LangChain LCEL.
from dspy.predict.langchain import LangChainPredict, LangChainModule

# This is how to wrap it so it behaves like a DSPy program.
# Just Replace every pattern like `prompt | llm` with `LangChainPredict(prompt, llm)`.
zeroshot_chain = RunnablePassthrough.assign(context=retrieve) | LangChainPredict(prompt, llm) | StrOutputParser()
zeroshot_chain = LangChainModule(zeroshot_chain)  # then wrap the chain in a DSPy module.

Appendix

DSPy can have better namespace organization, e.g. place all LLMs under a specific Python namespace.

Inside DSPy: The New Language Model Programming Framework You Need to Know About

Author(s): Jesus Rodriguez Originally published on Towards AI. Created Using MidjourneyI recently started an AI-focused…

towardsai.net

DSPy — Does It Live Up To The Hype?

The DSPy framework promises to replace manual prompt engineering with a programming framework for auto-tunded prompts…

medium.com

DSPy is the missing piece in programming with LLMs.

Retrieval Augmented Generation is hot. Multiple frameworks come to life, including the one of myself, Rag4p. You read…

jettro.dev

An Exploratory Tour of DSPy: A Framework for Programing Language Models, not Prompting

Is declarative approach appropriate to programming LLMs? A peek at DSPy programming model?

medium.com

GitHub - legendkong/DSPy-comparison: Comparison between DSPy vs LangChain for a simple RAG…

Comparison between DSPy vs LangChain for a simple RAG application. - legendkong/DSPy-comparison

github.com

DSPy

Auto-prompt-engineering with LLM

A gentle introduction to DSPy

This blog post provides a comprehensive introduction to DSPy, focusing on its use in handling large language models…

LLM Auto-Prompt & Chaining

Using DSPy with GPT 3.5 on Azure

Building an AI Assistant with DSPy

A way to program and tune prompt-agnostic LLM agent pipelines

Intro to DSPy: Simple Ideas To Improve Your RAG

Language models (LMs) like GPT-4 have transformed how we interact with machine learning systems, tackling tasks from…

DSPy for beginners: Auto Prompt Engineering using Programming

LangChain’s alternate for building Generative AI apps

An Introduction To DSPy

Declarative Self-Improving Language Programs (DSPy) aims to separate the program flow from the prompts. While also…

Demystifying DSPy: The Framework Redefining Your LLM Workflow

DSPy is a Python library that has seen an exponential uptick in interest lately. This article is my informed take on…

Gradient Blog: Achieving GPT-4 Level Performance At Lower Cost Using DSPy

Achieving GPT-4 Level Performance At Lower Cost Using DSPy Erin Mccarthy, ML Engineer at Gradient

Prompt Like a Data Scientist: Auto Prompt Optimization and Testing with DSPy

Applying machine learning methodology to prompt building

Integration with other products

Using DSPy with Qdrant to Build Advanced RAG Pipelines

The no-prompt technique to building a RAG

Prompt Like a Pro Using DSPy: A guide to build a better local RAG model using DSPy, Qdrant and…

Learn to build an end-to-end RAG pipeline and run it completely locally on your laptop using Chain of Thought, DSPy…

Automating Prompt Engineering with DSPy and Haystack

Teach your LLM how to talk through examples

DSPy RAG with LlamaIndex — Programming LLMs over Prompting

Fellow Developer: Hi Phil, when exactly should I use DSPy over LlamaIndex or LangChain when building LM apps over my…

DSPy | 🦜️🔗 LangChain

DSPy is a fantastic framework for LLMs that introduces an automatic compiler that teaches LMs how to conduct the…

dspy/examples/tweets/compiling_langchain.ipynb at main · stanfordnlp/dspy

DSPy: The framework for programming-not prompting-foundation models - dspy/examples/tweets/compiling_langchain.ipynb at…

Appendix

Inside DSPy: The New Language Model Programming Framework You Need to Know About

Author(s): Jesus Rodriguez Originally published on Towards AI. Created Using MidjourneyI recently started an AI-focused…

DSPy — Does It Live Up To The Hype?

The DSPy framework promises to replace manual prompt engineering with a programming framework for auto-tunded prompts…

DSPy is the missing piece in programming with LLMs.

Retrieval Augmented Generation is hot. Multiple frameworks come to life, including the one of myself, Rag4p. You read…

An Exploratory Tour of DSPy: A Framework for Programing Language Models, not Prompting

Is declarative approach appropriate to programming LLMs? A peek at DSPy programming model?

GitHub - legendkong/DSPy-comparison: Comparison between DSPy vs LangChain for a simple RAG…

Comparison between DSPy vs LangChain for a simple RAG application. - legendkong/DSPy-comparison

Written by Xin Cheng