You're now ready to start building your agent. You're given a fake resume that you will use to fill out a fake job application. In this lesson, you will set up the RAG capability of your agent. You will use Llama parse to parse the resume, load the extracted information into a vector store, and then run basic queries to test it. All right, let's go. You need nested async for this notebook to work. So, let's enable it here. First we bring in our imports and then we bring in nested async. And we execute both of the cells just to get them ready. We're also going to need our API keys. OpenAI we used earlier. And we're also going to need a Llama cloud API key because we're going to be using Llama parse to parse the PDFs. Llama parce is an advanced document parser that can read PDFs, word files, PowerPoints, Excel spreadsheets, stuff like that. It's really great at getting information out of complicated PDFs into a form LLMs find easy to understand. You can get a key at cloud dot LlamaIndex.AI for free. Let's start by parsing a resume. One of the coolest features of Llama Parse is the ability to tell it what kind of document it's parsing, so that it will more intelligently parse the contents. In this case, you can tell it that it's reading a resume. So let's import Llama Parse and then let's pass a document. You do this by calling Llama Parse passing in your API key telling it what result type you want. This could be markdown or text or a number of other types. And then we're going to give it a content guideline instruction saying: this is a resume. Gather related facts together and format it as bullet points with headers. Once you've created a parser on parser, you can call load data. With our fake resume, it starts parsing the file depending on the size of the file, this can take a little while, but our resume is pretty short, so it shouldn't take too long. Once it's done, we can print what we get back from the parser, which is an array of documents. So in this case, we're going to take the third element of the array and print out the text of that document. You can see that it's passed out some projects and said what's in the resume with headers. Remember it's doing markdown. So it's got a project's header. It's got a company name. It's got bullet points saying what our fake candidate did and all of those jobs. Now that we've got our documents, we can pass in to our vector store index. The vector store index will embed the text, meaning it'll turn it into vectors that we can search. To do this, it needs to use an embedding API. We'll be using one provided by OpenAI, which is why we needed an OpenAI key for this. So here we've imported OpenAI's embeddings and our vector store index. And now we're going to create an index using vector store index. Vector store index has a from documents method that takes the documents that were returned by Llama Parse. And here we're passing in the embedding model. Just to be absolutely clear what embeddings we should use. We're using text embedding through small, which is a very good embedding model provided by OpenAI. Now once again we're going to bring in our OpenAI LLM. And we're going to instantiate it. Again we're going to use the GPT-4o-mini model because it's quick and it's cheap. Once you've got an index, you can create a query engine using the as query engine method. You pass in which LLM you want your query engine to use, and you pass in similarity top K when your query engine is retrieving context to answer the question, it will rank all of the content that it finds by how relevant it is to your query, rather than return all of it. It only returns a top number. In this case, it's going to return the top five. Once we've got our query engine set up, we can call query with a question. A really basic question. In this case, what is their name and what was their most recent job. And then we can print out the response. Our fake developer's name is Sarah Chen. And their most recent job is senior full Stack developer at TechFlow Solutions. One thing that you might need to do is store vector Store, so that you can use it later. Vector stores are easily persisted to disk. You define a storage directory in which you want to store your data, and you call the persist method. That's all you need to do. It's now stored on your disk. To load it back, you need a new method load index from storage and a new class the storage context. Here, you're going to check whether or not you've already run this notebook. If you've already run this notebook, then the storage directory will exist and your content will already be there, in which case you'll load storage context from defaults, giving it the same persistence directory that you used earlier, and you'll create your index by calling load index from storage. Otherwise it will say that it didn't find it. We already called persist, so that just worked. Now that we've got our restored index, we can call it with as query engine dot query again, just like we did before. Again, it knows that our fake developers name is Sarah Chen. Congratulations. What you've done is you've performed retrieval augmented generation on a resume document. RAG is a very powerful technique that with proper scaling, can work across databases of thousands of documents. With our RAG pipeline in hand, let's turn it into a tool that can be used by an agent to answer questions. This is a stepping stone towards creating an agentic system that can perform your larger goal. We'll need two new classes for this: a function tool and a function calling agent. The first step is to create a regular Python function that performs a RAG query. It's important to give this function a descriptive name to markets, inputs, and output types, and to include a docstring that is the thing in triple quotes that you're about to see. This describes what it does. The framework will give all of this metadata to the LLM, the name, the types, and the docstring and the LLM will use that metadata to decide whether or not this tool is useful and how to use this tool. Once we've defined our function, we turn it into a tool that can be used by an agent using the from defaults function, a function tool, and we pass in the function. Now you can instantiate a function calling agent that uses this tool. There are a number of different agent types supported by the LlamaIndex. This one is particularly capable and efficient and it works well with OpenAI. To create it, you pass in an array of tools. Just one in this case, you pass it the LLM that it should use to power the agent. And in this case we're going to set verbose equals true so that we can see what it's doing when it works. Now that we've created our agent we can chat with it and ask it a question. In this case we're going to ask it how many years of experience the applicant has. You can see the output being very noisy because we've set verbose is equal to true. So it tells us when it's running steps. It says that it's added this user message to memory, which is our question. And then you can see it's calling the function. You can see which function it's calling, query resume. And you can see what parameters it's passing in. The query is how many years of experience does the applicant have. This is the output that it got from the function. And this is what the LLM responded with. You see it twice because verbose printed it once and we printed it once. Now you've got a RAG pipeline and an agent. Let's wrap them up neatly into a workflow that you can extend in your later lessons. You won't rely on any of the things you've already created in this lesson. You will start this one from scratch. So just like we did in the last lesson, we're going to import our events and our other workflow related core classes. You'll see start event and stop event, steps, event and context, all of which you've seen before. We're going to define our query event, which is the only event that we need in this particular workflow so far. And now we're going to create a RAG workflow. This is pretty complicated. So let's step through it line by line. A RAG workflow is a class. So you can give it class variables just like any other class. It's got a storage directory. We've given it an LLM and we've given it a query engine. These are the types here. The first step is going to be set up. We're going to check that we've been given a resume file which is attached to the start event. And we're going to instantiate our LLM as OpenAI again. This section you've seen before. We're going to check if our storage signature already exists. And if it does we're going to load it from disk. If it doesn't, then we're going to parse and load our documents. This is exactly the same thing that we did earlier in the lesson. We then create a vector store index just like we did earlier in the lesson, and we persist it to disk just like we did earlier in the lesson. Either way, you've now got an index and you turned it into a query engine. Now, you've got a query engine. You can fire off the query event. The query event will trigger ask question and ask a question will simply query the query engine with a question about the applicant. So, let's instantiate that workflow and we'll run it just like we've run any other workflow. In this case, the workflow is expecting two parameters. One is the location of the resume file that it can parse, and one is the query about the applicant. As you can see, it worked very quickly because we'd already persisted this to disk. It didn't need to reparse the document. It was able to just immediately answer the question. There's nothing in this workflow that you haven't done before. It's just making things neat and encapsulated. If you're particularly suspicious, you might note that there's a small bug here. If you run this a second time with a different resume, this code will find the old resume on disk, and therefore it won't bother to parse the new resume. You don't need to fix that now, but think about how you might fix that. Congratulations! Once you've successfully created an agent with the RAG tools. In the next lesson, you'll give your agent some more complicated tasks.