Today, we'll be making a case for adopting an Agentic LLM approach to improving search relevancy and addressing difficult use-cases, and using a Know-Your-Customer (KYC) use-case to demonstrate these benefits.
This is the follow-up article to the Agentic RAG article previously published on Searchlabs, which covered implementation of agents for RAG use-cases, and introduces the necessary background. That implementation has been expanded here, and integrated with a streamlit UI, which looks like this:

Intelligent hybrid search UI. Key point of interest is the agent thought/action on the left, which highlights how the agent crafts a more sophisticated query.
The full code is located in this Github repo.
To get started, run the following command to install dependencies:
And run the following to spin-up the app:
Please refer to the original blog for more details about set-up, and about the data.
Let's get to it!
Introduction
I have an index with 756,880 news articles, taken from the POLITICS Dataset.
I've generated embeddings for all of the text content in those articles, and now I would like to do Retrieval-Augmented-Generation with it, as you do. Perhaps following the process documented in other searchlabs articles
Let's ask a query, do a vector search with it, pass the search results to the LLM (in this case, GPT 4o mini hosted on Azure OpenAI), and ask for answer. No bells and whistles, just pure RAG.
Let's think of a query to ask... How about this?
Query 1:
Perfect! What answer do we get?
Answer 1:
The context provided does not contain specific information about Singapore's economic state in 2020. Therefore, I am unable to provide a review of Singapore's economy for that year based on the given context. If you have more specific information or another document, I would be happy to help with that.
Huh. That's weird. Let's take a look at the search results...
Search results 1:
It seems that the articles are all from the wrong time period. How unpleasant! Maybe it's a fluke. Let's try a different query.
Query 2:
Is the answer any better?
Answer 2:
In 2018, Microchip Technology Inc. (NASDAQ: MCHP) faced significant challenges following its acquisition of Microsemi Corp. for approximately $10.15 billion, which was completed on May 29, 2018. The acquisition was presented as a strategic move that would be "immediately accretive" to Microchip's earnings.
Okay, so the date is fine, and the subject matter is somewhat relevant. But it's the wrong company, which is perturbing. A quick glance at the results confirms that no search result is related to MagnaChip.
That's weird. What's going on?
Search has a problem with scale

These are both passages from an annual report. Without the labels, could you have told that one was from 2023, and the other from 2024? Probably not. Neither would an LLM, and vectors wouldn't help.
We might venture a few guesses as to why vector search performed poorly. Most prominently, there is no concept of time, and nothing to prevent the retrieval of news articles that are semantically similar but from the wrong time. For example, you may search for a bank's yearly financial report, with several in your index. Each of these reports is likely to share similar content and "voice", making it likely that search results will contain content from all three reports. This intermingled content is basically asking for an LLM hallucination. Less than ideal.
As significant as the temporal aspect is, it is only part of a larger problem.
In this example, we have 756,880 pairs of articles from news websites like Bloomberg, Huffpost, Politico, etc., and generated embeddings for each of them. As the number of articles increases, it becomes likelier and likelier that there will be pairs of documents with overlapping semantic content. This could be common terminology, phrasing, grammatical forms, subject matter etc... Perhaps they reference the same individuals or events over and over. Perhaps they even share similar voice and tone.
And as the quantity of such similar sounding articles increases, the distances between their vectors become less and less reliable as a measure of similarity, and the quality of search degenerates. We can expect this problem to generally worsen as more articles are added.
To summarize: Search quality worsens as data size increases, because irrelevant documents interfere with the retrieval of relevant documents.
Okay. If a preponderance of irrelevant documents is the issue, why don't we just remove them at search time? Bing bong, so simple (Not really, it turns out.)
Solution: Enhanced data processing

Enriching data with additional metadata allows us to improve search relevance using filters, ranges, and other traditional search features.
Elasticsearch started life as a fully featured search engine. Date ranges and filters, geometric/geospatial search, keyword, term and fuzzy matches with boolean operators, and the ability to finetune search relevance by boosting specific fields. The list of capabilities goes on and on. We can rely on these to refine and finetune search, and we can apply them even with documents that are just text articles.
In the past, we would have had to apply a suite of NLP capabilities to extract entities, categorize text, perform sentiment analysis, and establish relationships between lexical segments. This is a complex task requiring a host of different components, several of which are likely to be specialized deep learning models based on transformer architectures.
Elastic allows you to upload such models with Eland and then make use of their NLP capabilities within ingest pipelines With LLMs, the task is much simpler.
I can define an LLM inference endpoint like this:
And then add it to an ingest pipeline, allowing me to extract interesting metadata from my articles. In this case, I'll extract a list of entities, which could be people, organizations, and events. I'll extract geographic locations too, and I'll try to categorize the article. All it requires is prompt engineering where I define a prompt in the document context, and an inference processor where I call the LLM and apply it to the prompt, like so:
I can set it running with a reindex like this:
And after a short while, end up with enriched documents, containing useful metadata that can be used to refine and improve my search results. For example:
Fortunately, this dataset that I'm using comes with timestamps. If it did not, I might try asking the LLM to infer approximate dates based on content. I could further enrich the metadata with coordinates using open-source datasets or apps like Google's Geocoding API. I could add descriptions to the entities and categories themselves, or I could cross-correlate with other sources of data in my possession, like databases and data terminals. The possibilities are limitless and varied.

Document view and mappings after enrichment.
After chunking and embedding the documents with semantic_text
, the process of which is well-documented, the question remains: How do I actually make use of this new metadata?
Beyond vector search: Complex queries
A standard vector search may look like this. Assuming that I've already chunked and embedded my data with semantic_text
and ELSER_V2, I can make use of a sparse vector query like this:
And that gives me search results with these titles:
Okay. It's all the wrong dates, and also, the wrong hurricanes (I'm looking for Isaac and Sandy). This vector search isn't good enough, and can be improved by filtering over a date range, and searching over my new metadata fields. Perhaps the query needed actually looks more like this:
And these are the search results I get:
It's better. Exactly the news reports that were being searched for.
Realistically speaking, how would you generate that more complex query on the fly? The criteria changes based on what exactly the user asks. Maybe some user questions specify dates, while others lack them. Maybe specific entities are mentioned, maybe not. Maybe it is possible to infer details like article categories, or not. A uniform Elastic Query DSL requrest may not work so well, and it may need to adapt dynamically based on requirements and contexts.
There is another blog about generating Query DSL with LLMs. This approach, though effective, always presents the possibility of a misplaced bracket or malformed request. It takes a long time to execute because of the complexity and length of the query DSL, and that detracts from the user experience.
Why don't we use an agent?
The agentic model
To recap, an agent is an LLM given a degree of decision making capability. Provide a set of tools, define the parameters by which each tool might be used, establish the purpose and scope, and allow the LLM to dynamically select each tool in response to a situation.
A tool can be a knowledge base, or a traditional database, or a calculator, or a webcrawler, whatever. The possibilities are endless. Our goal is to provide an interface where the agent can interact with the tool in a structured manner, and then make use of the tool's outputs.
Creating complex query DSL with agents (intelligent hybrid search)

Intelligent Hybrid Search in practice - By narrowing down the search space, we are able to more effectively converge on relevant search results and increase the likelihood of a relevant RAG output.
With Langchain, we can define tools as python functions. Here, I have a function that takes in a few parameters, which the LLM will provide, and then builds an Elasticsearch query and sends it off using the python client. If each variable exists, it is added to a clause, and then the clause is appended to a list of clauses. These clauses are then assembled into a single Elasticsearch query, with a vector search component, then sent off as part of a search request.
The search results are then formatted, concatenated into a block of text, which serves as context for the LLM, and then finally returned.
The tool / function has a clear purpose; It allows the agent to interact with the Elasticsearch cluster in a well-defined and controlled manner. There is dynamic behavior, because the agent controls the parameters used for the search. There is definition and guardrail, because the broad structure of the query is predefined. No matter what the agent outputs, there will not be a malformed request that breaks the flow, and there will be some degree of consistency.
So when I ask the agent a query like: Concise summary of developments related to Elastic NV in the second half of the 2010s
, it will choose to use the RAG tool previously defined. It will call the tool with a set of parameters like this:
And that will result in a query that looks like this:
With luck, this query will give us better search results, and provide more satisfying answers than vector search alone could provide.
Let's test it out!
testing the agent versus vector search for Know-Your-Customer applications
One of the most impactful use-cases I can think of it Know-Your-Customer (KYC) for financial services and regulatory institutions. This application involves enormous amounts of manual research and number crunching, and is generally difficult for vector search alone to accomplish, due to its focus on specific entities, time periods, and subject matter. I think it is a use-case that is better served with an agentic approach rather than a 'pure' vector search approach.
The dataset is well suited for it, containing millions of news articles scraped from American news media. This is a possible implementation of such a use-case; collecting huge amounts of data into a single index and then searching it on-demand.
tests are performed with GPT-4o-mini deployed on Azure OpenAI as the LLM and ELSER_V2 as the embedding model. We ask queries, search our Elasticsearch index with either vector search or agentic search, then compare the answers and search results.
User query and LLM generated answer is shown, along with selected search results. Agent function calls are also shown with the agent's answer.
test 1: Singapore economy in review 2020
This is the first query we asked, and it represents is a very broad and open-ended query that traditionally has been difficult for vector-based RAG based systems to handle. Not unless there is an article in the knowledge base on the specific topic. We're hoping to get a range of articles that are at least somewhat relevant, and then string them together into a coherent summary
Vector search answer:
The vector search was able to retrieve relevant documents, subject matter wise. However, none of the documents were from the correct time period. Hence, the LLM cannot give an answer.
Agentic answer:
The agentic search was able to retrieve relevant documents from the correct time period, then string them together into a coherent answer. That's what we were looking for!
test 2: Summarize microsoft strategy in the second half of the last decade
The query has the same open-endedness as the first query, with additional difficulty due to having to infer the correct date range.
Vector search answer:
This is a plausible sounding and more detailed answer, with one caveat—most of it is inaccurate. Almost all search results are from the wrong time period, leading to, for example, facts from a 2008 article about cloud computing appearing in an answer that is supposed to be about 2015 - 2020. Upon closer examination, the contents are also generic and lacking detail. The only relevant and useful fact is the appointment of a new Chief Strategy Officer.
Agentic answer:
This is a much more satisfying and complete answer, with more detailed specifics. The search results are not only relevant, but from the correct time period. A fact check reveals that each point comes from one of the articles in the search results.
test 3: Elastic nv investor review 2010-2023
Why not ask about Elastic itself? Let's use a bigger date range too.
Vector search answer:
This query yielded exactly two articles of interest about leadership changes. It seems quite limited and probably wouldn't be satisfying for this use case.
Agentic answer:
This is a more complete answer, bringing in articles about ECS and a success story with the Missouri National Guard. It seems like the agentic approach, with more comprehensive search queries, brings about a qualitative improvement to search relevancy.
Conclusion
From testing, we see that vector search isn't able to account for temporal domains, and doesn't necessarily effectively emphasize crucial components of search queries, which can amplify relevance signals and improve overall search quality. We also see the degenerative impact of scaling, where an increasing amount of semantically-similar articles can result in poor or irrelevant search results. This can cause a non-answer at best, and a factually incorrect answer at worst.
The testing also shows that crafting search queries via an agentic approach can go a long way in improving overall search relevance and quality. It does this by effectively trimming down the search space, reducing noise by reducing the number of candidates in consideration, and it does so by making use of traditional search features that have been available since the earliest days of Elastic.
I think that's really cool!
There are many avenues that can be explored. For example, I would like to try out a use-case that depends on geospatial searches. Use-cases which can make use of a wider variety of metadata to narrow down relevant documents are likely to see the greatest benefits from this paradigm. I think this represents a natural progression from the traditional RAG paradigm, and I'm personally excited to see how far it can be pushed.
Well, that's all I had for today. Until next time!
Appendix
The full ingestion pipeline used to enrich the data, to run within the Elastic dev tools console.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.