In this lesson, you will learn about how to use the DPSy optimizer to automatically improve your DPSy program's quality. Let's get coding. This lesson comes with our lab. In the lab, you're get a hands on experience of using DPSy optimizer. To automatically improve the quality of agentic RAG, which uses Wikipedia as a data source. After the optimization you will see our RAG has a big quality boost. Let's dive in. Before talking about the DPSy Optimizer, let's think about what optimization means when we talk about GenAI applications. Mostly can be three things. It can mean optimizing a prompt template. Could also mean building high quality few-shot examples. These two parts both belong to a concept of prompt optimization or prompt engineering. Optimization could also mean fine tuning the LM weights, in DPSy we support all of these three parts. Now let's take our overview at how to use the DPSy optimizer. First, you need to pick the optimizer. For how to pick the optimizer, please refer to our documentation site at DPSy.AI. For this lesson the lab we'll use the Metric v2 optimizer, which is a good optimizer for prompt template optimization and building few-shot examples. After you pick the optimizer, you need to tell the optimizer, and then you find a version metric function and also the training and the validation data set. The core idea of the requirement is the optimizer has to know what is a good program and what is a bad program. Different from normal machine learning job, the data set can be as small as 20 records. If you don't have a validation dataset, the optimizer will split part of the data to be the validation data set. Let's talk about how optimizers work. In this lesson, we'll focus on prompt engineering and demonstrate by Miprov2 optimizer for how to do a fine tuning with the DPSy optimizer and how other optimizers work, please read on our documentation page DPSy.ai/tutorials So, from a very high level we first have build multiple sets of few short examples. We'll talk about how to build that in the next slide. Then, based on the few-shot examples and your program information, we let the LM generating multiple prompt template candidates which map to the instruction of the signature for a user class base as signature. If you use the class base as signature, that means the docstring of it. Then we sample from both directions. The few-shot examples set and few-shot instruction set. To form the candidate program and run evaluations on candidate program. The evaluation is based on a matching function and validation dataset. We pick data from validation dataset that run against a program compare the program output against a golden labels using a user-defined matcher function. The final program score is the average over all pick data. Along the process, we'll continuously pick the candidate of the highest score. And importantly, how optimizer when neither do you approve for a search nor once for all the combos is that we use a statistical way composition sampling to intelligently sample towards optimal combo. Now we have seen the optimization flow at a high level. Let's talk about how we generate the few-shot examples and the instruction candidates. Few-shot examples are generated through the process called bootstrapping. This is very simple. We grab data from a training dataset and fit into a DSPy program, which can be one module or multiple modules. Then if according to the matching function does a score over a threshold set by the user, we crop the trace which is the input and output of each module. To make that as a few-shot example candidate for that module. Please note that one data can generate multiple traces because we add randomness to each call via certain temperature to non-zero. Now let's talk about how we build an instruction candidates. We graph the program code and description, along with a few-shot examples and some arbitrary tips like being comprehensive or being concise. And send all of them to the LLM through something called DPSy Proposer and generate a bunch of instruction candidates. Now we have the candidates of both instruction, which is prompt template and a few-shot examples. We can start generating candidate programs by picking one few-shot example candidate and the one instruction candidate and combine them. And we can evaluate the candidate program and keep sampling for a number of trials the user specified. We have done several experiments on performance of the Miprov2 optimizer. We see that the MiproV2 outperforms the original prompt by a large margin in multiple tasks. For more details, please check out the paper Optimizing Instructions and Demonstrations for Multi-stage large language Model programs. We can also utilize MLflow to interpret the optimization process. Simply turn on the autolog by setting a few more flags to true. MLflow.DSPy.autolog. Then the optimization process will be tracked. As we said, when we try the candidate program, we try evaluation on that and all these evaluations along with a candidate's information that wasn't instruction was a few-shot example of a save to MLflow. So get that full track of what's being tried across the process. Now let's get some coding to see how optimizer works. As we did in previous labs, to set up API key, and then we need to set up MLflow tracking server. And we need to give that a unique identifier. Let's call that DSPy course two. And we can set up the autologin and turn on a few more flags than before that so that we can track the optimization process. And we need to specify the LLM that let's continue using Chat GPT-4o-mini. For this RAG agent, will be based on the Wikipedia data. And then we use agentic RAG and start off like fix a number not a source code. Agentic RAG basically means we'll let LLM decide if we still need more data from the data source before we get to the final answer. Let's first define our tool which is search a Wikipedia. We use ColBERT v2 which is public available interface for acquiring Wikipedia data. And the return value would be the chunk from Wikipedia data source. And we still use the DSPy react as our program and this time the input output is very simple is just question answer. So we use a signature and has only one tool is that search Wikipedia. Now let's recap what we need to have the optimizer wrong. We need to have dataset, provide this in dataset. So let's load dataset. We have prepared the dataset for you which is a subset of the Hopper QA dataset which is a question answering dataset based on Wikipedia data. After loading the data, let's take a look at what what the data looks like. So it's very simple. Has a question, has an answer. And the input case might be a question. Okay. It's time to create our optimizer. We'll use the Miprov2 optimizer. And as we said, we need to prepare a matching function so that the optimizer knows how to evaluate our candidate program. And for this task, which is to use answer exact amounts. And to configure optimizer, we recommend you use the auto mode, which has three modes light, medium and heavy, which are carefully trained for good performance. But if you want to customize on the Miprov2 optimizer, you can find the available options on the documentation side. And the search bar search for DSPy dot Mipro v2. Now you will find all the available configurations. The optimization process can take quite a while. So in order to speed up the process, we have recorded the cost for you. So the optimization okay, the cost may be much faster. In real production you don't need to require any cache, just make LLM calls. Now let's kick off the optimization process. We use optimizer dot compile function and send the program to it. Specify the training set and validation set. All right. The optimization is done. We can take a look at the logging. So basically, we're continuously getting a candidate program and running evaluation over that. And then we'll finally pick the best program out of the process. Let's take a look at how what is getting changed along with the process. So if you'll remember, the original signature is just a very simple input to a question to answer, and the react model will not have any instruction. but after the optimization process, the reactor submodule will have a very comprehensive instruction populated. And to also have a few, few-shot examples built into that represented as a list in the demo attributes. Let's further evaluate the now optimizer react our RAG application. We get a score of 31 and we can see some example input and output here in the table. Cool. Let's now evaluate the optimizer react and get our score. Okay. The score of 54. And see that without any human interaction, just by simply using the optimizer, we get a score boost from 31 to 54. That is the power of DSPy optimizer. As we mentioned in the slides, we check the optimization process with MLflow and you can view that in MLflow UI. Going to the UI, the optimization run shows up as a non-zero. And each child maps to a evaluation of candidate program. Click into the run, you can see the attributes of the candidate program, a few-shot examples, instruction and on without attributes and the evaluation score of the candidate. So keeping track for of the optimization process. In this lesson, you'll have learned how to use the DSPy optimizer to optimize your DSPy program. And we have seen how powerful the DSPy optimizer is through optimizing the RAG application with Wikipedia data.