This is a cache of https://www.elastic.co/search-labs/blog/chatgpt-elasticsearch-rag-app-ui. It is a snapshot of the page at 2025-02-22T00:36:40.853+0000.
ChatGPT and Elasticsearch revisited: Part 2 - The UI Abides - Elasticsearch Labs

ChatGPT and Elasticsearch revisited: Part 2 - The UI Abides

This blog expands on Part 1 by introducing a fully functional web UI for our RAG-based search system. By the end, you'll have a working interface that ties the retrieval, search, and generation process together—while keeping things easy to tweak and explore.

Don't want to read the whole thing? No problem, go clone the app and get searching!

In Part 1, we walked through setting up our search index, using the Open Crawler to crawl Elastic blog content, configured an Inference API to an LLM, and tested our RAG setup with Elastic’s Playground in Kibana.

Now, in Part 2, I’ll make good on a promise from the end of that blog by returning with a functional web UI!

This guide will walk through:

  • How the app works.
  • The key controls and customization options available to users.
  • Enhancements that improve search, retrieval, and response generation.

What the app does

At a high level, this app takes a user’s search query or question and follows these steps:

  • Retrieves relevant documents using hybrid search—combining text matching and semantic search.
  • Displays matching document snippets along with links to their full content.
  • Builds a prompt using the retrieved documents and predefined instructions.
  • Generates a response from an LLM, providing grounding documents from Elasticsearch results.
  • Provides controls to modify the generated prompt and response from the LLM.

Exploring the UI controls

The application provides several controls for refining search and response generation. Here’s a breakdown of the key features:

1. Search box

  • Users enter a query just like a search engine.
  • The query is processed using both lexical and vector search.

2. Generated response panel

  • Displays the LLM-generated response based on retrieved documents.
  • Sources used to generate the response are listed for reference.
  • Includes an Expand/Collapse toggle to adjust panel size.

3. Elasticsearch results panel

  • Shows the top-ranked retrieved documents from Elasticsearch.
  • Includes document titles, highlights, and direct links to the original content.
  • Helps users see which documents influenced the LLM’s response.

4. Source filtering controls

  • Users can select which data sources to use for retrieval after the initial search.
  • This allows users to focus on specific domains of content.

5. Source filtering controls

  • Users can select if the LLM can use its training to generate a response outside of the grounding context.
  • Opens up the possibility of expanded answers beyond what is passed to the LLM.

6. Number of sources selector

  • Allows users to adjust how many top results are passed to the LLM.
  • Increasing sources often improves response grounding, but too many can incur unnecessary token costs.

7. Chunk vs. document toggle

  • Determines whether grounding is done with the full document or relevant chunked.
  • Chunking improves search granularity by breaking long texts into manageable sections.

8. LLM prompt paned

  • Allows users to view the complete prompt passed to the LLM to generate the response.
  • Helps users better understand how an answer was generated.

App architecture

The application is a Next.js web app that provides a user interface for interacting with a RAG-based search system.

This architecture eliminates the need for a separate backend service, leveraging Next.js API routes for seamless search and LLM processing integration.

Code snippets

Let's look at a few sections of code that are the most relevant to this app and may be useful if you want to modify it to work with different datasets.

ES query

The Elasticsearch query is pretty straightforward.

/app/api/search/route.ts

By using a hybrid retriever, we should allow for matching on searches that are more keyword-based and natural language questions, which are becoming the norm for people to use in their searches these days.

You'll notice we are using the highlight functionality in this query. This allows us to easily provide a relevant summary of a matched document in the Elasticsearch Results section. It also allows us to use matching chunks for grounding when we are building the prompt for the LLM, and chunk is selected as the grounding option.

Extracting the ES results

Next, we need to extract the results from Elasticsearch

/app/api/search/route.ts

We extract the search results (hits) from the Elasticsearch response, ensuring they exist and are in an expected array format. If the results are missing or incorrectly formatted, we log an error and return a 500 status.

Parse the hits

We have our results but we need to parse out the results into a format we can use to display results to the user in the UI and to build our prompt for the LLM.

/app/api/search/route.ts

There are a couple of key things happening in this code block.

  1. We use the highlight value from the top hit of `semantic_body` as a snippet for each ES doc displayed.
  2. Depending on the user's selection, we store the prompt context using either the `semantic_body` as the chunk or the full `body` as the body
  3. We extract the `title`.
  4. We extract the URL to the blog and ensure it is formatted correctly so users can click on it to visit the blog.

Lab sources for clicking

The last processing we do is to parse out the aggregation values

/app/api/search/route.ts

We do this to have a clickable list of the various "Labs" where the results came from. This way, users can select only the sources they want included, and when they hit search again, the Labs they have checked will be used as a filter.

Managing state

The `SearchInterface` component is the core component in the app. It uses React state hooks to manage all the data and configurations.

/components/SearchInterface.tsx

The first three lines here are used to track the search results from Elasticsearch, the generated response from the LLM, and the generated prompt used to instruct the LLM.

The last two are used to track user settings from the UI. Selecting the number of sources to include in grounding with the LLM and if the LLM should be grounded with just matching chunks or the full blog article.

Handling search queries

When the user hits the submit button, handleSearch takes over.

/components/SearchInterface.tsx

This function sends the queries to /api/search (shown in snippets above) including the user's source selection, grounding settings, and API Credentials. The response is parsed and stored in state, which triggers UI updates.

Source Extraction

After fetching the results, we create a sources object.

/components/SearchInterface.tsx

This will later be passed to the LLM as part of the prompt. The LLM is instructed to cite any sources it uses to generate its response.

Constructing & sending the LLM prompt

The prompt is dynamically created based on the user's settings and includes grounding documents from Elasticsearch.

/components/SearchInterface.tsx

By default, we instruct the LLM to only use the provided grounding documents in its answer. However, we do provide a setting to allow the LLM to use its own training "knowledge" to construct a wider response. When it is allowed to use its own training, it is further instructed to append a warning to the end of the response.

We instruct the LLM to cite the provided documents as sources, but only ones that it actually uses.

We give some instructions on how the response should be formatted for readability in the UI.

Finally, we pass it to /api/llm for processing

Streaming the AI response

With the documents from Elasticsearch parsed and returned to the front end immediately, we call the LLM to generate a response to the user's question.

/components/SearchInterface.tsx

There are a lot of lines here but essentially this part of the code calls /api/llm (covered below) and handles the streaming response. We want the LLM response to stream back to the UI as it is generated, so we parse each event as it is returned allowing the UI to dynamically update.

We have to decode the stream, do a little cleanup, and update resultText with the newly received text.

Calling the LLM

We are calling the LLM using Elasticsearch's Inference API. This allows us to centralize the management of our data in Elasticsearch.

/app/api/llm/route.ts

This bit of code is pretty straightforward. We send the request to the streaming inference API Endpoint we created as part of the setup (see below under Getting Things Running), and then we stream the response back.

Handling the streams

We need to read the stream as it comes in chunk-by-chunk.

/app/api/llm/route.ts

Here we decode the streamed LLM response chunk-by-chunk and forwards each decoded part to the frontend in real-time.

Getting things running

Now that we've reviewed some of the key parts of the code, let's get things actually installed and up and running.

Completion Inference API

If you don't have a completion Inference API configured in Elasticsearch, you'll need to do that so we can generate a response to the user's question.

I used Azure OpenAI with the gpt-4o model, but you should be able to use other services. The key is that it must be a service that the Stream Inference API supports.

The individual service_settings depends on which service and model you use. Refer to the Inference API docs for more info.

Clone

If you have the GitHub CLI installed and configured, you can clone the UI repo into the directory of your choice.

You can also download the zip file at then unzip it.

Install dependencies

Follow the step in the readme file in the repo to install the dependencies.

Start the development server

We are going to just run in development mode. This runs with live reloading and debugging. There is a production mode that runs an optimized production build for deployment.

To start in dev mode run:

This should start the server up, and if there are no errors, you should see something similar to:

If you have something else running using port 3000 your app will start using the next available port. Just look at the output to see what port it uses. If you want it to run on a specific port, say 4000 you can do so by running:

To the UI

Once the app is running, you can try out different configurations to see what works best for you.

Connection settings

The first thing you need to do before using the app is set up your connection credentials. To do that click the gear icon ⚙️ in the upper right.

When the box pops up, input your API Key and Elasticsearch URL.

Defaults

To get started simply ask a question or type in a search query into the search bar. Leave everything else as is.

The app will query Elasticsearch for the top relevant docs, in the above example about rrf, using rrf! The docs with a short snippet, the blog title, and a clickable URL will be returned to the user.

The top three chunks will be combined with a prompt and sent to the LLM. The generated response will be streamed back.

Bowl your own game

Once the initial search results and generated response are displayed, the user can follow up and make a couple of changes to the settings.

Lab sources

All the blog sites that are part of the index searched will be listed under Lab Sources. If you were to add additional sites or sources to the index we created in part one with the Open Crawler, they would show up here.

You can select only the sources you want to be considered for search results and click search again. The subsequent search will use the checked sources as a filter on the Elasticsearch query.

Answer source

One of the advantages we talk about with RAG is providing grounding documents to the LLM. This helps cut down on hallucinations (nothing's perfect). However, you may want to allow the LLM to use its training and other "knowledge" outside of the grounding docs to generate a response. Unchecking Context Only will allow the LLM this freedom.

The LLM should provide a warning at the end of the response, letting you know if it journeyed outside the groundings. As with many things LLM, this isn't guaranteed. Either way, use caution with these responses.

Number of sources

We default to using three chunks as grounding information for the LLM. Increasing the number of context chunks sometimes gives the LLM more information to generate its response. Sometimes, simply providing the whole blog about a specialized topic works best.

It depends on how spread out the topic is. Some topics are covered in many blogs in various ways. So, giving more sources can provide a richer answer. Something more esoteric may only be covered once so extra chunks may not help.

Chunk or doc

Regarding the number of sources, throwing everything at the LLM is not usually the best way to generate an answer. While most blogs are relatively short compared to many other document sources, say health insurance policy documents, throwing a long document at an LLM has several downsides. First, if the relevant information is only included in two paragraphs and you include twenty, you pay for eighteen paragraphs of useless tokens. Second, that useless information slows down the LLM's generated response.

Generally, stick with chunks unless you have a good reason to send whole documents, blogs in this case.

Here's the bridge--

Hopefully, this walkthrough has helped ensure you're not out of your element when it comes to setting up a UI that provides semantically retrieved documents from Elasticsearch and generated answers from an LLM.

There certainly are a lot of features we could add and settings we could tweak to make the experience even better here. But it's a good start. The great thing about providing code to the community is that you're free to take it and customize and tweak it if you are happy.

Be on the lookout for part 3, where we will instrument the app using Open Telemetry!

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself