Meet OPRO: Google DeepMind’s New Method that Optimizes Prompts Better than Humans
I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence | Jesus Rodriguez | Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…
Prompt engineering and optimization is one of the most debated topics with large language models(LLMs). Terms such as prompt engineering are often used to describe the task of optimizing a language instruction in order to achieve a specific task with an LLM. These tasks are typically performed by humans but what if AI could do a better job optimizing prompts? In a recent paper, researchers from Google DeepMind proposed a technique called Optimization by Prompting (OPRO) that attempts to address precisely this challenge.
The core idea of OPRO is to leverage LLMs as optimization agents. With the evolution of prompting techniques, LLMs have demonstrated remarkable prowess across various domains. Their proficiency in comprehending natural language opens up a novel avenue for optimization. Instead of rigidly defining optimization problems and prescribing programmed solvers for update steps, DeepMind adopts a more intuitive approach. They articulate optimization challenges in natural language and direct the LLMs to generate new solutions iteratively, drawing from problem descriptions and previously discovered solutions. Leveraging LLMs in optimization grants the advantage of swift adaptability to diverse tasks, achieved by merely altering the problem description in the prompt. Further customization becomes feasible by appending instructions to specify desired solution attributes.
With OPRO, DeepMind adopts an innovative approach by introducing the concept of the “meta-prompt” as the catalyst for LLMs to act as optimizers. This meta-prompt comprises two pivotal components:
1) Firstly, a repository of previously generated prompts, each paired with its corresponding training accuracy.
2) Secondly, a comprehensive problem description encompassing randomly selected exemplars from the training set, strategically illustrating the task at hand. These directives also encompass instructions aimed at facilitating the LLM’s comprehension of interrelationships between various elements and the preferred output format.
In contrast to recent studies focusing on automated prompt generation with LLMs, DeepMind’s approach stands distinct. Each step in their optimization methodology entails the generation of fresh prompts, all with the singular objective of enhancing test accuracy. This trajectory builds upon previously generated prompts, deviating from the practice of modifying input prompts based on natural language feedback or maintaining semantic congruence. By harnessing the entire optimization trajectory, OPRO empowers the LLM to systematically craft new prompts that steadily elevate task accuracy throughout the optimization journey, even when starting with prompts of low task accuracies.
At each optimization juncture, the LLM generates candidate solutions aligned with the problem description and informed by evaluations of previously assessed solutions within the meta-prompt. Subsequently, these novel solutions undergo evaluation and become integrated into the meta-prompt for subsequent optimization iterations. The optimization process reaches its conclusion when the LLM no longer proposes superior solutions or when a predefined maximum number of optimization steps transpires.
Now that we have outlined the key characteristics of OPRO, the next logical question is to quantify the optimization process. In the paper, DeepMind explores the advantages of using Large Language Models (LLMs) for optimization, highlighting key desirables in this context:
In the context of the optimization problem, OPRO focuses on some key desirable properties:
1. Natural Language Descriptions: One significant advantage of LLMs in optimization is their proficiency in understanding natural language. This enables users to describe optimization tasks without the need for formal specifications. For example, in prompt optimization, where the objective is to find a prompt that maximizes task accuracy, users can provide a high-level text summary along with input-output examples.
2. Balancing Exploration and Exploitation: The exploration-exploitation trade-off is a pivotal challenge in optimization. For LLMs to be effective optimizers, they must strike a balance between these competing objectives. This means that LLMs should be capable of exploiting promising areas within the search space where good solutions are already identified, while also exploring new regions to uncover potentially superior solutions.
For designing the meta-prompt, OPRO looks for two key characteristics:
1. Optimization Problem Description: The meta-prompt, serving as the input to the LLM acting as an optimizer, consists of two crucial components. The first component is the textual description of the optimization problem, encompassing details such as the objective function and solution constraints. For instance, in prompt optimization, the LLM can be instructed to “generate a new instruction that achieves higher accuracy.”
2. Optimization Trajectory: LLMs have shown the ability to recognize patterns from in-context demonstrations. DeepMind’s meta-prompt leverages this capability by instructing the LLM to utilize the optimization trajectory for generating new solutions. This trajectory comprises past solutions paired with their optimization scores, sorted in ascending order.
Solution Generation Challenges
For generating the solutions, OPRO tried to optimize for a couple of properties:
1. Optimization Stability: Not all solutions in the optimization process achieve high scores or exhibit consistent improvement. Due to the prompt’s sensitivity, low-quality solutions in the input optimization trajectory can significantly impact LLM output, especially in the initial stages of exploration. DeepMind addresses this by prompting the LLM to generate multiple solutions at each optimization step.
2. Exploration-Exploitation Trade-off: DeepMind fine-tunes the LLM sampling temperature to strike the right balance between exploration and exploitation. Lower temperatures encourage the LLM to exploit the solution space around previously identified solutions, making minor adjustments. Conversely, higher temperatures encourage more exploration to identify novel solutions and directions.
OPRO in Action
Th evaluate OPRO, Google DeepMind uses nothing other than the famous traveling salesman problem(TSP) that consists on finding the route that traverses all nodes in a network and returns to the starting node. The results were quite impressive.
To get an idea of OPRO’s performance with LLMs, DeepMind provides an illustrative instance of a meta-prompt employed in prompt optimization with instruction-tuned PaLM 2-L (PaLM 2-L-IT) on GSM8K. In this case, the generated instruction is intended to be added at the outset of “A:” within the scorer LLM output . The notation “<INS>” signifies the insertion point for the generated instruction. The meta-prompt is structured as follows:
- Blue text: This section contains solution-score pairs.
- Purple text: Describes the specifics of the optimization task and the desired output format.
- Orange text: Contains meta-instructions that provide guidance for the optimization process.
OPRO represents one of the most interesting approaches to prompt optimization we have seen in recent months. The idea that AI can be a better optimizer for LLM prompts is nothing new but OPRO is one of the best implementations ever created.