Tutorial

Using different LLMs in watsonx.ai flows engine

Learn how to use various models with the watsonx.ai flows engine to create a custom flow for text completion with several different prompting techniques

By Roy Derks

New models are released almost every day, making it hard for developers to know which one to use based on how they differentiate or are priced. That’s why having the flexibility to experiment with various models can significantly enhance the development and deployment of generative AI applications. Watsonx.ai flows engine offers this flexibility by providing a unified API that works seamlessly with all models available on the IBM watsonx platform.

Whether you're using one of the foundation models from IBM Granite or Meta's LLama 3, the way you send a request to flows engine using the API or SDK remains consistent. The only difference is in how you structure your prompts and set parameters, like temperature and decoding methods. This unified approach allows developers to effortlessly switch between different models to determine which one best suits their specific generative AI use case.

In this series, I'll delve into some of the most popular models on watsonx.ai and demonstrate how to use them with the flows engine to create a custom AI flow for text completion with several different prompting techniques.

I'll explore the capabilities of the following LLM families:

Note: As soon as each tutorial becomes available, it'll be added to this series.

This first tutorial in the series focuses on setting up watsonx.ai flows engine to work with some of the most popular models that are available in IBM watsonx.ai. For this, you’ll start by creating a custom flow for text completion. This foundational work will be the same for every LLM that you will be using in watsonx.ai flows engine. The rest of the tutorials in the series will explore different LLM families more in-depth.

Setting up flows engine

With watsonx.ai flows engine, you can build AI flows using a CLI and consume those flows by using a CLI. Using flows engine is completely free, and the free plan gives you (limited) access to all of the models that are available in the watsonx.ai platform.

To get started, you need to sign up for a free account, using your IBMid or GitHub account.
After signing up, you can download the CLI from this page, which also has the installation instructions. To install the CLI, you must have Python installed on your machine.
With the CLI installed, you can authenticate to your flows engine account by running the following command.
wxflows login
Provide the values for environment name and apikey. You can find more information in Authenticating to the CLI.
Using the CLI, you can now set up a new project. For this, you must create a new directory on your local machine and run the init command.
mkdir my-project cd my-project wxflows init --endpoint-name wxflows-genai/my-endpoint
You should replace the value for my-project with a different name for the project, and replace my-endpoint with a different endpoint name.

This creates a wxflows.toml file and a .env.sample file, which are both needed to configure the project.

In the wxflows.toml file, you must define a first flow, which uses the templatedPrompt and completion steps. The first step sets the prompt template, which you'll change for every model that you’re trying, and the second interacts with the LLM.

[wxflows.deployment]
 flows="""
     textCompletion = templatedPrompt(promptTemplate: "{question}") | completion(model: textCompletion.model, parameters: textCompletion.parameters)
 """

The previous flow is a very basic flow for text completion, where you could substitute the value for promptTemplate with any prompt template. Every LLM will have slightly different expectations of how you structure a prompt, so you want to change this value for different LLMs you might be using.

Before you can deploy this flow, you must set the .env file to use watsonx.ai as the AI engine. Copy the .env.sample file, and add the following value.
STEPZEN_WATSONX_HOST=shared
This ensures that you're using the shared watsonx.ai instance that’s part of the free plan for watsonx.ai flows engine. If you have your own instance of watsonx.ai, you can visit the Connect to watsonx.ai in the documentation to connect to it.
The final step is to deploy this flow, which makes it available on a live endpoint that you can connect to by using the SDK.
wxflows deploy
The endpoint to which the textCompletion flow is deployed will be printed in your terminal. Make sure to write down this endpoint to use later.

With the textCompletion flow built, you must have a way to interact with the flow. For this, you use the JavaScript SDK that’s also available for Python in the next section.

Using the SDK for text completion

You can use the SDK for watsonx.ai flows engine to interact with flows deployed to flows engine endpoints. With the SDK, you can invoke the different flows that you have on your endpoint with all of the parameters needed to get a response.

To install the SDK, you must set up a new directory in the project directory that you created earlier. You also initialize a new JavaScript project.
mkdir app cd app npm init -y
After initializing a new JavaScript project, you can install the SDK from npm by running the following command.
npm i wxflows

In the app directory, you must create a new file called index.js in which you can add the following code.

const wxflows = require('wxflows');

 (async () => {
     const WXFLOWS_ENDPOINT = "YOUR_WXFLOWS_ENDPOINT"
     const WXFLOWS_APIKEY = "YOUR_WXFLOWS_APIKEY"

     if (!WXFLOWS_ENDPOINT || !WXFLOWS_APIKEY) {
         console.log('Please set the environment variables for your Endpoint and Api Key')
         return null;
     }

     const model = new wxflows({
         endpoint: WXFLOWS_ENDPOINT,
         apikey: WXFLOWS_APIKEY
     })

     const schema = await model.generate()

     // Make sure these match your values in `wxflows.toml`
     const flowName = 'textCompletion'
     const question = `Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5.`

     const result = await model.flow({
         schema,
         flowName,
         variables: {
             question,
             model: 'ibm/granite-13b-chat-v2',
             parameters: {
                 max_new_tokens: 700,
                 stop_sequences: []
             },
         },
     })

     console.log('Response: ', result?.data?.[flowName]?.out?.results[0]?.generated_text)
 })();

You must replace YOUR_WXFLOWS_ENDPOINT and YOUR_WXFLOWS_APIKEY with your own values. Remember, the endpoint for your flows engine project was printed in the terminal after deploying it. The apikey can be found on the dashboard or by running the command wxflows whoami --apikey.

If you get an incomplete answer, try increasing the value for max_new_tokens or tell the LLM to give you a concise answer of, for example, a maximum of ten sentences.

To execute this bit of code, you can run the following command, which should print the answer to the instruction "Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5" in your terminal.
node index.js
It should print a list of recommendations. You can change the prompt to narrow the recommendations to a specific country or region. Or, perhaps, you're looking for recommendations for a different family composition.

The flow textCompletion is using the model ibm/granite-13b-chat-v2, but you can use this same flow with different models, too. Let's try to use another model this time and compare the differences.

const flowName = 'textCompletion'
 const question = `Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5.`

 const result = await model.flow({
     schema,
     flowName,
     variables: {
         question,
         model: 'meta-llama/llama-3-8b-instruct',
         parameters: {
             max_new_tokens: 700,
             stop_sequences: []
         },
     },
 })

In the previous code, the LLM used is meta-llama/llama-3-8b-instruct, which is an LLM from the LLama family from Meta. If you compare the responses, you might see they have a different way of reasoning and structuring the response. Another LLM that you could try is mistralai/mistral-large.

You can find a complete list of all available models at Foundation model IDs.

Instead of passing the complete prompt or the model name through the SDK, you can also define these using the flow language. For example, this allows you to create a custom flow for each LLM that you want to support. This will be covered in the next tutorials in this series.

What's next?

This first tutorial in a series of four, explained how to set up watsonx.ai flows engine using the CLI and SDK. It showed how to do text completion for different LLMs, but there's much more to uncover. In the next three tutorials, you'll explore three LLM families (IBM Granite, Meta's Llama, and Mistral) and learn how to tweak the prompt templates and adjoining parameters to optimize the responses for each of these LLMs using watsonx.ai flows engine.

Want to learn more? Join our Discord community, and let us know what other types of tutorials you'd like to see in the future.

Topics

Languages

Products

Open Source

Using different LLMs in watsonx.ai flows engine

Setting up flows engine

Using the SDK for text completion

What's next?