This is a cache of https://www.elastic.co/search-labs/blog/elastic-rag-chatbot-vertex-ai-gke. It is a snapshot of the page at 2025-04-05T00:36:53.484+0000.
Getting Started with the Elastic Chatbot RAG app using Vertex AI running on <strong>google</strong> Kubernetes Engine - Elasticsearch Labs

Getting Started with the Elastic Chatbot RAG app using Vertex AI running on google Kubernetes Engine

Learn how to configure the Elastic Chatbot RAG app using Vertex AI and run it on google Kubernetes Engine (GKE).

I’ve been having fun playing around with the Elastic Chatbot RAG example app. It’s open source and it’s a great way to get started with understanding Retrieval Augmented Generation (RAG) and trying your hand at running a RAG app. The app supports integration with a variety of GenAI Large Language Models (LLMs) like OpenAI, AWS Bedrock, Azure OpenAI, google Vertex AI, Mistral AI, and Cohere.

You can run the app on your local computer using Python or using Docker. You can also run the app in Kubernetes. I recently deployed the app to google Kubernetes Engine (GKE) configured to use google Vertex AI as the backing LLM. I was able to do it all just using a browser, google Cloud, and Elastic Cloud. This blog post will walk you through the step-by-step process that I followed to configure the Chatbot RAG app to use Vertex AI and how to run it on GKE.

Enable Vertex API

Since this blog post is focused on running the Elastic Chatbot RAG app with Vertex AI as the backing LLM, the very first step is to go to google Cloud and enable the Vertex AI API. If this is your first time using Vertex AI, you'll see an Enable all recommended APIs button. Click that button to enable the necessary google Cloud APIs to use Vertex AI. Once you've done that, you should see that the Vertex AI API is now enabled.

Use google Cloud Shell Editor to clone the Chatbot RAG app

Now that you’ve got the Vertex AI API enabled, the next step is to clone the code for the Chatbot RAG app. google Cloud has the perfect tool for doing this right in your browser: Cloud Shell Editor.

1. Open google Cloud Shell Editor.

2. Open your terminal in Cloud Shell Editor. Click the Terminal menu and select New Terminal.

3. Clone the Chatbot RAG app by running the following command in the terminal.


4. Change directory to the Chatbot RAG app’s directory using the following command.

Use google Cloud Shell Editor to create an app configuration file

The app needs to access Elastic Cloud and Vertex AI and it does so using configuration values that are stored in a configuration file. The configuration file for the app should have the filename .env and you will create it now. The example app includes an example configuration file named env.example that you can copy to create a new file.

1. Create a .env file that will contain the app’s configuration values using the following command:

2. Click the View menu and select Toggle Hidden Files. Files like .env are hidden in Cloud Shell Editor by default.

3. Open the .env file for editing. Find the line that sets the ELASTICSEARCH_URL value. That’s where you’ll make your first edit.

Elastic Cloud - Create Deployment

The Chatbot RAG app needs an Elasticsearch backend that will power the retrieval augmentation part of the RAG app. So the next step is to create an Elastic Cloud deployment with Elasticsearch and ML enabled. Once the deployment is ready, copy the Elasticsearch Endpoint URL to add it to the app’s .env configuration file.

  1. Create an Elastic Cloud deployment.
  2. Copy the Elasticsearch Endpoint URL.

Use google Cloud Shell Editor to update the .env configuration file with Elasticsearch URL

  1. Add Elasticsearch Endpoint URL to the .env file.
  2. Comment out unused configuration lines.
  3. Uncomment the line where the ELASTICSEARCH_API_KEY is set.

Elastic Cloud - Create API Key and add its value to .env configuration file

Jumping back into the Elastic Cloud deployment, click the Create API Key button to create a new API Key that will be used by the app to access Elasticsearch running in your deployment. Paste the copied API Key into your .env configuration file using google Cloud Shell Editor.

  1. Create an Elastic Cloud API Key.
  2. Copy the Key’s encoded value and add it to the app’s .env configuration file in google Cloud Shell Editor.

Use google Cloud Shell Editor to update the .env configuration file to use Vertex AI

Moving down in the .env configuration file, find the lines to configure a connection to Vertex AI and uncomment them. The first custom values that you'll need to set are google_CLOUD_PROJECT and google_CLOUD_REGION. Set google_CLOUD_PROJECT to your google Cloud Project ID, which you can find right on the welcome page for your google Cloud project. Set google_CLOUD_REGION to one of the available regions supported by Vertex AI that you’d like to use. For this blog post, we used us-central1.

  1. Uncomment the Vertex AI lines in the .env configuration file.
  2. Set google_CLOUD_PROJECT to your google Cloud Project ID in the .env configuration file.
  3. Set google_CLOUD_REGION in the .env configuration file to one of the available regions supported by Vertex AI.
  4. Save the changes to the .env configuration file.

google Cloud IAM - Create Service Account and download its Key file

Now it’s time to set up the app’s access to Vertex AI and GKE. You can do this by creating a google IAM Service Account and assigning it the Roles to grant the necessary permissions.

1. Create a Service Account with the IAM Roles necessary to access Vertex AI and GKE.

  • Add the following IAM Roles:
    • Vertex AI Custom Code Service Agent
    • Kubernetes Engine Default Node Service Account

2. Create a Service Account Key and download it to your local computer.

google Kubernetes Engine - Create cluster

google Kubernetes Engine (GKE) is where you’re going to deploy and run the Chatbot RAG app. GKE is the gold standard of managed Kubernetes, providing a super scalable infrastructure for running your applications. While creating a new GKE cluster, in the “create cluster” dialog, edit the Advanced settings > Security setting to use the Service Account you created in the previous step.

  1. Create a new GKE cluster.
  2. Use the Service Account created previously, within Advanced settings > Security, when creating the cluster.

google Cloud Shell Editor - Upload google Service Account Key file

Back in google Cloud Shell Editor you can now complete the configuration of the app's settings by adding the google Service Account key you previously downloaded to your local computer.

Click the Cloud Shell Editor’s More button to upload and add the google Cloud Service Account key file to the app.

  1. Upload the google Cloud Service Account key file using Cloud Shell Editor.
  2. Save the file to the top level directory of the Chatbot RAG app.

google Cloud Shell Editor - Deploy app to google Kubernetes Engine

Everything for the app’s configuration is in place, so you can now deploy the app to GKE. Connect the Cloud Shell terminal to your GKE cluster using the gcloud command line tool. Once you’re connected to the cluster, you can use the kubectl command line tool to add the configuration values from your .env configuration file to your cluster. Next, use kubectl to add the google Cloud Service Account key file to your cluster. Then, use kubectl to deploy the app to GKE.

1. Connect the Cloud Shell terminal to your GKE cluster using gcloud. Replace example-project in the command with your google Cloud Project ID found on the welcome page for your google Cloud project.

2. Add your .env configuration file values to your cluster using kubectl.

3. Add the google Cloud Service Account key file to your cluster using kubectl.

4. Deploy the app to your cluster using kubectl. This command will create a new Elasticsearch index in Elastic Cloud with sample data, initialize the frontend and backend of the app with the values that you provided in the .env file and then deploy the app to the GKE cluster. It will take a few minutes for the app to be deployed. You can use the GKE cluster’s details page to watch its status.

google Kubernetes Engine - Expose deployed app

The final required step is to expose the app in GKE so it's viewable on the Internet and in your browser. You can do this in google Cloud’s GKE Workloads, which is where your deployed app will appear as chatbot-rag-app in the list of running GKE workloads. Select your workload by clicking on its workload Name link. In the details page of your app’s workload, use the Actions menu to select the Expose action. In the Expose dialog, set the Target port 1 to 4000 which is the port that the Chatbot RAG app is configured to run on in the k8s-manifest.yml file that was used for its deployment to GKE.

  1. Select the chatbot-rag-app in GKE Workloads.
  2. Use the Expose action from the Actions menu to expose the app.
  3. In the Expose dialog, set the Target port 1 to 4000.

Try out the app

After clicking the Expose button for the workload, you’ll be taken to the workload’s Service Details page in GKE. Once the exposed app is ready, you'll see External Endpoints displayed along with a linked IP address. Click the IP address to try out the Chatbot RAG app.

Elastic Cloud is your starting point for GenAI RAG apps

Thanks for reading. Check out a guided tour of all the steps included in this blog post.

Get started with building GenAI RAG apps today and give Elastic Cloud a try.

Read to explore more? Try a hands-on tutorial where you can build a RAG app in a sandbox environment.

To learn more about using RAG for real world applications, see our recent blog post series GenAI for customer support.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself