KServe

A scalable model inference platform on Kubernetes for trusted AI

An open source, standard cloud-agnostic model inference platform that provides pluggable production serving.

20 June 2025

Article

Fast inference and furious scaling with vLLM and KServe

Beginners who are new to the world of LLM inferencing and serving can learn about why it's a complicated thing to do and gain a clearer idea of how to get started using two open source tools: vLLM and KServe. Learn the 'why' and 'how' of LLM inferencing and serving.

28 February 2025

Article

Put the power of AI in your apps without being an AI expert

Use Caikit to serve Hugging Face AI models that are consumed by your application.

28 February 2025

Article

The open source ecosystem of watsonx

Learn how open source impacts watsonx through some of the key open source projects that IBM has invested in.

16 June 2024

Open Project

KServe

A standard cloud-agnostic model inference platform that provides pluggable production serving.

15 June 2024

Open Project

KServe ModelMesh

A mature, general-purpose model serving management and routing layer. Optimized for high volume, high density, and frequently changing model use cases, ModelMesh intelligently loads and unloads models to and from memory to strike a balance between responsiveness and compute.

12 February 2024

Tutorial

Creating a custom serving runtime in KServe ModelMesh

In this tutorial, learn how to serve your custom models by using ModelMesh Serving.

19 December 2023

Article

Parallel inferencing with KServe and Ray Serve

Using KServe and Ray Serve together is a flexible, scalable, and efficient approach to serving machine learning models in production

29 June 2023

Tutorial

Serving AI Models from Kubernetes Persistent Volumes with KServe ModelMesh

In this tutorial, learn how to configure ModelMesh Serving to use Kubernetes "built-in" storage via persistent volumes.

27 June 2023

Article

Kubeflow Pipelines Overview

Create, deploy, and manage machine learning workflows on Kubernetes using Kubeflow Pipelines

15 June 2023

Tutorial

Get started with KServe ModelMesh for multi-model serving

Learn about some of ModelMesh's features and core resources like the ServingRuntime and the InferenceService, all while deploying and inferencing your first model deployed on your own ModelMesh Serving instance.

13 October 2021

Blog post

ModelMesh and KServe bring eXtreme scale standardized model inferencing on Kubernetes

Learn how ModelMesh intelligently loads and unloads AI models to and from memory to strike a tradeoff between responsiveness to users and the computational footprint

Topics

Languages

Products

Open Source

KServe

A scalable model inference platform on Kubernetes for trusted AI

Fast inference and furious scaling with vLLM and KServe

Put the power of AI in your apps without being an AI expert

The open source ecosystem of watsonx

KServe

KServe ModelMesh

Creating a custom serving runtime in KServe ModelMesh

Parallel inferencing with KServe and Ray Serve

Serving AI Models from Kubernetes Persistent Volumes with KServe ModelMesh

Kubeflow Pipelines Overview

Get started with KServe ModelMesh for multi-model serving

ModelMesh and KServe bring eXtreme scale standardized model inferencing on Kubernetes