About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Tutorial
Using different LLMs in watsonx.ai flows engine
Learn how to use various models with the watsonx.ai flows engine to create a custom flow for text completion with several different prompting techniques
New models are released almost every day, making it hard for developers to know which one to use based on how they differentiate or are priced. That’s why having the flexibility to experiment with various models can significantly enhance the development and deployment of generative AI applications. Watsonx.ai flows engine offers this flexibility by providing a unified API that works seamlessly with all models available on the IBM watsonx platform.
Whether you're using one of the foundation models from IBM Granite or Meta's LLama 3, the way you send a request to flows engine using the API or SDK remains consistent. The only difference is in how you structure your prompts and set parameters, like temperature and decoding methods. This unified approach allows developers to effortlessly switch between different models to determine which one best suits their specific generative AI use case.
In this series, I'll delve into some of the most popular models on watsonx.ai and demonstrate how to use them with the flows engine to create a custom AI flow for text completion with several different prompting techniques.
I'll explore the capabilities of the following LLM families:
Note: As soon as each tutorial becomes available, it'll be added to this series.
This first tutorial in the series focuses on setting up watsonx.ai flows engine to work with some of the most popular models that are available in IBM watsonx.ai. For this, you’ll start by creating a custom flow for text completion. This foundational work will be the same for every LLM that you will be using in watsonx.ai flows engine. The rest of the tutorials in the series will explore different LLM families more in-depth.
Setting up flows engine
With watsonx.ai flows engine, you can build AI flows using a CLI and consume those flows by using a CLI. Using flows engine is completely free, and the free plan gives you (limited) access to all of the models that are available in the watsonx.ai platform.
To get started, you need to sign up for a free account, using your IBMid or GitHub account.
After signing up, you can download the CLI from this page, which also has the installation instructions. To install the CLI, you must have Python installed on your machine.
With the CLI installed, you can authenticate to your flows engine account by running the following command.
wxflows loginProvide the values for
environment nameandapikey. You can find more information in Authenticating to the CLI.Using the CLI, you can now set up a new project. For this, you must create a new directory on your local machine and run the
initcommand.mkdir my-project cd my-project wxflows init --endpoint-name wxflows-genai/my-endpointYou should replace the value for
my-projectwith a different name for the project, and replacemy-endpointwith a different endpoint name.This creates a
wxflows.tomlfile and a.env.samplefile, which are both needed to configure the project.In the
wxflows.tomlfile, you must define a first flow, which uses thetemplatedPromptandcompletionsteps. The first step sets the prompt template, which you'll change for every model that you’re trying, and the second interacts with the LLM.[wxflows.deployment] flows=""" textCompletion = templatedPrompt(promptTemplate: "{question}") | completion(model: textCompletion.model, parameters: textCompletion.parameters) """The previous flow is a very basic flow for text completion, where you could substitute the value for
promptTemplatewith any prompt template. Every LLM will have slightly different expectations of how you structure a prompt, so you want to change this value for different LLMs you might be using.Before you can deploy this flow, you must set the
.envfile to use watsonx.ai as the AI engine. Copy the.env.samplefile, and add the following value.STEPZEN_WATSONX_HOST=sharedThis ensures that you're using the shared watsonx.ai instance that’s part of the free plan for watsonx.ai flows engine. If you have your own instance of watsonx.ai, you can visit the Connect to watsonx.ai in the documentation to connect to it.
The final step is to deploy this flow, which makes it available on a live endpoint that you can connect to by using the SDK.
wxflows deployThe endpoint to which the
textCompletionflow is deployed will be printed in your terminal. Make sure to write down this endpoint to use later.
With the textCompletion flow built, you must have a way to interact with the flow. For this, you use the JavaScript SDK that’s also available for Python in the next section.
Using the SDK for text completion
You can use the SDK for watsonx.ai flows engine to interact with flows deployed to flows engine endpoints. With the SDK, you can invoke the different flows that you have on your endpoint with all of the parameters needed to get a response.
To install the SDK, you must set up a new directory in the project directory that you created earlier. You also initialize a new JavaScript project.
mkdir app cd app npm init -yAfter initializing a new JavaScript project, you can install the SDK from
npmby running the following command.npm i wxflowsIn the
appdirectory, you must create a new file calledindex.jsin which you can add the following code.const wxflows = require('wxflows'); (async () => { const WXFLOWS_ENDPOINT = "YOUR_WXFLOWS_ENDPOINT" const WXFLOWS_APIKEY = "YOUR_WXFLOWS_APIKEY" if (!WXFLOWS_ENDPOINT || !WXFLOWS_APIKEY) { console.log('Please set the environment variables for your Endpoint and Api Key') return null; } const model = new wxflows({ endpoint: WXFLOWS_ENDPOINT, apikey: WXFLOWS_APIKEY }) const schema = await model.generate() // Make sure these match your values in `wxflows.toml` const flowName = 'textCompletion' const question = `Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5.` const result = await model.flow({ schema, flowName, variables: { question, model: 'ibm/granite-13b-chat-v2', parameters: { max_new_tokens: 700, stop_sequences: [] }, }, }) console.log('Response: ', result?.data?.[flowName]?.out?.results[0]?.generated_text) })();You must replace
YOUR_WXFLOWS_ENDPOINTandYOUR_WXFLOWS_APIKEYwith your own values. Remember, the endpoint for your flows engine project was printed in the terminal after deploying it. Theapikeycan be found on the dashboard or by running the commandwxflows whoami --apikey.If you get an incomplete answer, try increasing the value for
max_new_tokensor tell the LLM to give you a concise answer of, for example, a maximum of ten sentences.To execute this bit of code, you can run the following command, which should print the answer to the instruction "Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5" in your terminal.
node index.jsIt should print a list of recommendations. You can change the prompt to narrow the recommendations to a specific country or region. Or, perhaps, you're looking for recommendations for a different family composition.
The flow
textCompletionis using the modelibm/granite-13b-chat-v2, but you can use this same flow with different models, too. Let's try to use another model this time and compare the differences.const flowName = 'textCompletion' const question = `Take the role of a personal travel assistant and give me recommendations for a summer holiday for a family of 5.` const result = await model.flow({ schema, flowName, variables: { question, model: 'meta-llama/llama-3-8b-instruct', parameters: { max_new_tokens: 700, stop_sequences: [] }, }, })In the previous code, the LLM used is
meta-llama/llama-3-8b-instruct, which is an LLM from the LLama family from Meta. If you compare the responses, you might see they have a different way of reasoning and structuring the response. Another LLM that you could try ismistralai/mistral-large.You can find a complete list of all available models at Foundation model IDs.
Instead of passing the complete prompt or the model name through the SDK, you can also define these using the flow language. For example, this allows you to create a custom flow for each LLM that you want to support. This will be covered in the next tutorials in this series.
What's next?
This first tutorial in a series of four, explained how to set up watsonx.ai flows engine using the CLI and SDK. It showed how to do text completion for different LLMs, but there's much more to uncover. In the next three tutorials, you'll explore three LLM families (IBM Granite, Meta's Llama, and Mistral) and learn how to tweak the prompt templates and adjoining parameters to optimize the responses for each of these LLMs using watsonx.ai flows engine.
Want to learn more? Join our Discord community, and let us know what other types of tutorials you'd like to see in the future.