This is a cache of https://www.elastic.co/search-labs/blog/rag-playground-introduction. It is a snapshot of the page at 2024-09-10T00:07:30.820+0000.
Using Elastic's Playground to experiment with RAG applications — Search Labs

Playground: Experiment with RAG applications with Elasticsearch in minutes

In this blog, you'll learn about Playground and how to use it to experiment with Retrieval-Augmented Generation (RAG) applications using Elasticsearch.

Update: Try the new Playground app in the Elastic demo gallery.

What is Playground?

Elastic's playground experience is a low-code interface for developers to explore grounding LLMs of their choice with their own private data in minutes.

While prototyping conversational search, the ability to rapidly iterate on and experiment with key components of a RAG workflow (for example: hybrid search, or adding reranking) are important— to get  accurate and hallucination-free responses from LLMs.

Elasticsearch vector database and the Search AI platform provides developers with a wide range of capabilities such as comprehensive hybrid search, and to use innovation from a growing list of LLM providers. Our approach in our playground experience allows you to use the power of those features, without added complexity.

A/B test LLMs and choose different inference providers

Playground’s intuitive interface allows you to A/B test different LLMs from model providers (like OpenAI and Anthropic) and refine your retrieval mechanism, to ground answers with your own data indexed into one or more Elasticsearch indices. The playground experience can leverage transformer models directly in Elasticsearch, but is also amplified with the Elasticsearch Open Inference API which integrates with a growing list of inference providers including Cohere and Azure AI Studio.

As Elasticsearch developers already know, the best context window is built with hybrid search. Your strategy for architecting towards this outcome requires access to many shapes of vectorized and plain text data, that can be chunked and spread across multiple indices.

We’re helping you simplify query construction with newly introduced query retrievers to Search All the Things! With three key retrievers (available now in 8.14 and Elastic Cloud Serverless) hybrid search with scores normalized with RRF is one unified query away. Using retrievers, the playground understands the shape of the selected data and will automatically generate a unified query on your behalf. Store vectorized data and explore a kNN retriever, or add metadata and context to generate a hybrid search query by selecting your data. Coming soon, semantic reranking can easily be incorporated into your generated query for even higher-quality recall.

Once you’ve tuned and configured your semantic search to production standards, you’re ready to export the code and either finalize the experience in your application with your Python Elasticsearch language client or LangChain Python integration. 

Playground is accessible today on Elastic Cloud Serverless and available today in 8.14 on Elastic Cloud

Using the Playground

Playground is accessible from within Kibana (the Elasticsearch UI) by navigating to “Playground” from within the side navigation.

Connect to your LLM

Playground supports chat completion models such as GPT-4o from OpenAI, Azure OpenAI, or Anthropic through Amazon Bedrock. To start, you need to connect to either one of these model providers to bring your LLM of choice.

Chat with your data

Any data can be used, even BM25-based indices. Your data fields can optionally be transformed using text embedding models (like our zero-shot semantic search model ELSER), but this is not a requirement.Getting Started is extremely simple - just select the indices you want to use to ground your answers and start asking questions.In this example, we are going to use a PDF and start with using BM25, with each document representing a page of the PDF.

Indexing a PDF document with BM25 with Python

First, we install the dependencies. We use the pypdf library to read PDFs and request to retrieve them.

!pip install -qU pypdf requests elasticsearch

Then we read the file, creating an array of pages containing the text.

    import PyPDF2
    import requests
    from io import BytesIO

    def download_pdf(url):
        response = requests.get(url)
        if response.status_code == 200:
            return BytesIO(response.content)
        else:
            print("Failed to download PDF")
            return None

    def get_pdf_pages(pdf_file):
        pages = []
        pdf_reader = PyPDF2.pdfReader(pdf_file)
        for page in pdf_reader.pages:
            text = page.extract_text()
            pages.append(text)
        return pages

    pdf_file = download_pdf("https://arxiv.org/pdf/2103.15348.pdf")
    if pdf_file:
        pages = get_pdf_pages(pdf_file)

And we then import this into elasticsearch, under the my_pdf_index_bm25 index.

    from elasticsearch import helpers, Elasticsearch

    client = Elasticsearch(
       "<my-cloud-url>",
       api_key=ELASTIC_API_KEY,
    )

    helpers.bulk(
        client,
        [
            {
                "_index": "my_pdf_index_bm25",
                "_source": {
                    "text": page,
                    "page_number": i,
                },
            }
            for i, page in enumerate(pages)
        ],
        request_timeout=60,
    )

Chatting with your data with Playground

Once we have connected our LLM with a connector and chosen the index, we can start asking questions about the PDF. The LLM will now easily provide answers to your data.

What happens behind the scenes?

When we choose an index, we automatically determine the best retrieval method. In this case, BM25 keyword search is only available, so we generate a multi-match type query to perform retrieval.

As we only have one field, we defaulted to searching for this. If you have more than one field, you can choose the fields you want to search to improve the retrieval of relevant documents.

Asking a question

When you ask a question, Playground will perform a retrieval using the query to find relevant documents matching your question. It will then use this as context and provide it with the prompt, grounding the answer that’s returned from your chosen LLM model.

We use a particular field from the document for the context. In this example, Playground has chosen the field named “text,” but this can be changed within the “edit context” action.

By default, we retrieve up to 3 documents for the context, but you can adjust the number from within the edit context flyout as well.

Asking a follow up question

Typically, the follow-up question is tied to a previous conversation. With that in mind, we ask the LLM to rewrite the follow-up question using the conversation into a standalone question, which is then used for retrieval. This allows us to retrieve better documents to use as context to help answer the question.

Context

When documents are found based on your question, we provide these documents to the LLM as context to ground the LLM’s knowledge when answering. We automatically choose a single index field we believe is best, but you can change this field by going to the edit context flyout.

   

Improving retrieval with Semantic Search and Chunking

Since our query is in the form of a question, it is important for retrieval to be able to match based on semantic meaning. With BM25 we can only match documents that lexically match our question, so we’ll need to add semantic search too.

Sparse Vector Semantic search with ELSER

One simple way to start with semantic search is to use Elastic’s ELSER sparse embedding model with our data. Like many models of this size and architecture, ELSER has a typical 512-token limit and requires a design choice of an appropriate chunking strategy to accommodate it. In upcoming versions of Elasticsearch, we’ll chunk by default as part of the vectorization process, but in this version, we’ll follow a strategy to chunk by paragraphs as a starting point. The shape of your data may benefit from other chunking strategies, and we encourage experimentation to improve retrieval.

Chunking and ingesting the PDF with pyPDF and LangChain

To simplify the example, we will use LangChain tooling to load and split the pages into passages. LangChain is a popular tool for RAG development that can be integrated and used with the Elasticsearch vector database and semantic reranking capabilities with our updated integration.

Creating an ELSER inference endpoint

The following REST API calls can be executed to download, deploy, and check the model's running status. You can execute these using Dev Tools within Kibana.

# Starts ELSER Service into Elasticsearch ML node
# This may take a while, depending on ML node autoscaling
PUT _inference/sparse_embedding/my-elser-model
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  }
}

# Check to see if trained model status. Look at nodes.routing_state is "started"
GET _ml/trained_models/my-elser-model/_stats
Ingesting into Elasticsearch

Next we will set up an index and attach a pipeline that will handle the inference for us. 

# Setup an elser pipeline to embed the contents in text field
# using ELSER into the text_embedding field
PUT /_ingest/pipeline/my-elser-pipeline
{
  "processors": [
    {
      "inference": {
        "model_id": "my-elser-model",
        "input_output": [
          {
            "input_field": "text",
            "output_field": "text_embedding"
          }
        ]
      }
    }
  ]
}

# Setup an index which uses the embedding pipeline
# ready for our documents to be stored in
PUT /elser_index
{
  "mappings": {
    "properties": { 
      "text": {
        "type": "text"
      },
      "text_embedding": {
        "type": "sparse_vector"
      }
    }
  },
  "settings": {
    "index": {
      "default_pipeline": "my-elser-pipeline"
    }
  }
}
Splitting pages into passages and ingesting into Elasticsearch

Now that the ELSER model has been deployed, we can start splitting the PDF pages into passages and ingesting them into Elasticsearch.

    # pip install pypdf langchain_community langchain elasticsearch tiktoken langchain-elasticsearch

    from langchain_community.document_loaders import PyPDFLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from elasticsearch import helpers, Elasticsearch

    loader = PyPDFLoader("https://arxiv.org/pdf/2103.15348.pdf")
    
    client = Elasticsearch(
       "<my-cloud-url>",
       api_key=ELASTIC_API_KEY,
    )

    data = loader.load()

    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512, chunk_overlap=256
    )
    docs = loader.load_and_split(text_splitter=text_splitter)

    helpers.bulk(
        client,
        [
            {
                "_index": "elser_index",
                "_source": {
                    "text": doc.page_content,
                    "page_number": i,
                },
            }
            for i, doc in enumerate(docs)
        ],
        request_timeout=60,
    )

That’s it! We should have passages ingested into Elasticsearch that have been embedded with ELSER.

See it in action on Playground

Now when selecting the index, we generate an ELSER-based query using the deployment_id for embedding the query string.

When asking a question, we now have a semantic search query that is used to retrieve documents that match the semantic meaning of the question.

Hybrid Search made simple

Enabling the text field can also enable hybrid search. When we retrieve documents, we now search for both keyword matches and semantic meaning and rank the two result sets with the RRF algorithm.

Improve the LLM’s answers

With Playground, you can adjust your prompt, tweak your retrieval, and create multiple indices (chunking strategy and embedding models) to improve and compare your responses.

In the future, we will provide hints on how to get the most out of your index, suggesting methods to optimize your retrieval strategy.

System prompt

By default, we provide a simple system prompt which you can change within model settings. This is used in conjunction with a wider system prompt. You can change the simple system prompt by just editing it.

Optimizing context

Good responses rely on great context. Using methods like chunking your content and optimizing your chunking strategy for your data is important. Along with chunking your data, you can improve retrieval by trying out different text embedding models to see what gives you the best results. In the above example, we have used Elastic’s own ELSER model, but the inference service supports a wide number of embedding models that may suit your needs better. 

Other benefits of optimizing your context include better cost efficiency and speed: cost is calculated based on tokens (input and output). The more relevant documents we can provide, aided by chunking and Elasticsearch's powerful retrieval capabilities, the lower the cost and faster latency will be for your users. 

If you notice, the input tokens we used in the BM25 example are larger than those in the ELSER example. This is because we effectively chunked our documents and only provided the LLM with the most relevant passages on the page.  

Final Step! Integrate RAG into your application

Once you’re happy with the responses, you can integrate this experience into your application. View code offers example application code for how to do this within your own API.

For now, we provide examples with OpenAI or LangChain, but the Elasticsearch query, the system prompt, and the general interaction between the model and Elasticsearch are relatively simple to adapt for your own use.

Conclusion

Conversational search experiences can be built with many approaches in mind, and the choices can be paralyzing, especially with the pace of innovation in new reranking and retrieval techniques, both of which apply to RAG applications.

With our playground, those choices are simplified and intuitive, even with the vast array of capabilities available to the developer. Our approach is unique in enabling hybrid search as a predominant pillar of the construction immediately, with an intuitive understanding of the shape of the selected and chunked data and amplified access across multiple external providers of LLMs.

Build, test, fun with playground

Try the Playground demo or head over to Playground docs to get started today! Explore Search Labs on GitHub for new cookbooks and integrations for providers such as Cohere, Anthropic, Azure OpenAI, and more.

Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!
Recommended Articles