Inside DSPy: A Framework for Algorithmic Prompt Optimization

Launched a few months ago, the framework has rapidly become one of the most complete LMP stacks in the market.

Jesus Rodriguez
4 min readJun 24, 2024
Created Using Ideogram

I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

A few months ago, I wrote about an emerging project from Stanford University called DSPy. The core idea of DSPy is to create a world-class programming experience for language model applications. I recently got the chance to take a second look and was shocked how far along they have come. So much so that I decided to write a second post about DSPy because, it really feels like a new framework.

Conceptually, DSPy is a framework designed to optimize the prompts and weights of language models (LMs) used in complex systems. Typically, creating such systems without DSPy involves breaking down the problem, fine-tuning prompts for each step, and generating synthetic examples for optimization. This process is cumbersome, requiring adjustments every time there’s a change in the pipeline, LM, or data.

DSPy simplifies and enhances this process by separating the program flow from the parameters of each step and introducing optimizers that tune prompts and weights based on a desired metric. This systematic approach allows DSPy to teach models like GPT-3.5, GPT-4, T5-base, or Llama2–13b to perform tasks more reliably and with higher quality.

DSPy Architecture

DSPy is based on a series of core components that abstract the implementation of language model programming applications.

1. Language Models (LLMs)

At the core of DSPy are LLMs, which interact with models within the framework. DSPy focuses on algorithmically optimizing these LLMs, particularly in pipeline-based programs.

2. Signatures

Signatures in DSPy specify the required input/output behavior of a module, allowing users to define what the LM needs to do rather than how to prompt it. This approach is more modular and adaptive compared to manually crafting prompts.

3. Modules

Modules are the building blocks of DSPy programs. Each module represents a prompting technique and can process inputs to produce outputs. These modules can be combined into larger programs, similar to neural network modules in frameworks like PyTorch.

4. Data

Using DSPy involves training, development, and test sets. For each example, the inputs, intermediate labels, and final labels are identified. DSPy optimizers can work effectively with as few as 10 example inputs, though more examples can improve results.

5. Metrics

Metrics are functions that evaluate the output of a system, providing a score that quantifies performance. These metrics guide DSPy in optimizing programs to achieve higher accuracy or other desired outcomes.

6. Optimizers

DSPy optimizers adjust the parameters of a program to maximize the specified metrics. These optimizers take the program, the metric, and training inputs to fine-tune prompts and LM weights, improving performance even with limited data.

7. DSPy Assertions

DSPy Assertions automate the enforcement of computational constraints on LMs, guiding them towards desired outcomes with minimal manual intervention. This feature enhances the reliability and correctness of LM outputs.

Using DSPy

To use DSPy, follow these steps:

1. Define the target task and examples.

2. Outline the pipeline steps.

3. Run examples through the pipeline.

4. Define the core dataset.

5. Specify success metrics.

6. Perform zero-shot evaluations.

7. Compile the solution using a DSPy optimizer.

8. Iterate until achieving the desired outcome.

#### Example Code

import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

# Set up the LM.

turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=250)

# Load math questions from the GSM8K dataset.

gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:10],[:10]

# Define the module

class CoT(dspy.Module):
def __init__(self):
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.prog(question=question)

# Compile and evaluate the model

from dspy.teleprompt import BootstrapFewShot

# Set up the optimizer: we want to “bootstrap” (i.e., self-generate) 4-shot examples of our CoT program.

config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)

# Optimize! Use the `gsm8k_metric` here. In general, the metric tells the optimizer how well it’s doing.

teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)

DSPy is one of the most interesting frameworks for language model programming in the current market. DSPy flexible programming model makes it as interesting or more than LangChain or LlamaIndex to build sophisticated LLM workflows.



Jesus Rodriguez

CEO of IntoTheBlock, President of Faktory, President of NeuralFabric and founder of The Sequence , Lecturer at Columbia University, Wharton, Angel Investor...