Calculating vector embeddings for semantic search and retrieval augmented generation (RAG)

Vector embeddings are numerical representations that convert various types of data, such as words, sentences, or images, into arrays of numbers that machine learning (ML) models can understand. Each embedding captures the semantic meaning of a data point, which helps the model recognize patterns and relationships. These embeddings are created either by training a model on a large, relevant dataset or by using an existing pretrained model.

In this article, you'll learn how to calculate vector embeddings in Java using LangChain4j using Granite Embedding models. You'll create a simple project that calculates vector embeddings from some words and calculates the distance between vectors to understand the importance of embeddings.

In generative AI applications, embeddings enable the usage retrieval augmented generation (RAG) to provide knowledge to the model.

The IBM Granite family of models includes the Granite Embedding models which generate high-quality text embeddings. IBM trained these embedding models to generate vector embeddings for input text. The Granite models are great at supporting long documents or large contexts in embedding and retrieval, handling multilingual text or multi-domain content, enterprise requirements such as legal, licensing, or GDPR compliance issues, needing accuracy in semantic search or RAG pipelines, and latency and compute resources that can handle mid-size to large embedding models.

LangChain4j is an open-source Java library that is designed to integrate with Java frameworks like Quarkus or Spring and that makes it easier to build applications powered by large language models (LLMs). LangChain4j offers a unified API so you don't have to work with each LLM provider's interface. And, it supports creating embeddings, storing and querying them (for semantic search), chat memory, agents, and retrieval-augmented generation (RAG) pipelines.

Regarding embeddings calculation, LangChain4j has adapters to integrate with multiple popular models, such as OpenAI, Amazon Bedrock, and Google Vertex AI. But another option is to use the in-process feature, which loads a model in ONNX format locally, to calculate the embeddings. This method requires you to download locally the model and the tokenizer files before using them in LangChain4j. In the case of the Granite Embedding model, IBM provides both files on the HuggingFace site.

Just what is a vector embedding?

Before we build our Java project, let's understand vector embedding by considering these simple examples:

A cat and a kitten are closer in meaning than a cat and a car, even though in terms of characters, cat and car are close.
The meaning of apple in a specific context: I eat an apple and I eat a fruit is not the same as my computer is an Apple. The first two are closer to the latter one.

We'll work with these simple examples in our sample Java project.

How do I work with vector embeddings?

For our sample vector embedding project, we'll use the granite-embedding-30m-english in ONNX format, which generates a vector of 384 dimensions. To generate an embedding from text through the model, we'll invoke the model using the LangChain4j Embeddings support.

Getting set up

Let's start by:

Adding Langchain4j embedding dependency, which contains the ONNX runtime to load and run the model locally.
Downloading the required model files using the download-maven-plugin Maven plugin.

Open the pom.xml file and add the following dependency:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings</artifactId>
    <version>1.6.0-beta12</version>
</dependency>

Then, in the plugins section, add the download-maven-plugin plugin twice to download both the ONNX and the tokenizer files into the target/models directory.

Next, we configure Maven to trigger the downloading phase at compilation time, but any other Maven phase is valid too.

<plugin>
    <groupId>io.github.download-maven-plugin</groupId>
    <artifactId>download-maven-plugin</artifactId>
    <version>2.0.0</version>
    <executions>
        <execution>
            <id>install-embedding-model</id>
            <phase>compile</phase>
            <goals>
                <goal>wget</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <url>https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.onnx</url>
        <outputDirectory>${project.build.directory}/model</outputDirectory>
    </configuration>
</plugin>
<plugin>
    <groupId>io.github.download-maven-plugin</groupId>
    <artifactId>download-maven-plugin</artifactId>
    <version>2.0.0</version>
    <executions>
        <execution>
            <id>install-embedding-model-tokenizer</id>
            <phase>compile</phase>
            <goals>
                <goal>wget</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <url>https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/tokenizer.json</url>
        <outputDirectory>${project.build.directory}/model</outputDirectory>
    </configuration>
</plugin>

Finally, run the mvn compile command at the project directory from a terminal to download all dependencies required to run the project.

Writing some code

The last step is writing some code to:

Load the model into memory using the OnnxEmbeddingModel class.
Transform some words to embeddings using the embed method of the class.

Let's instantiate the OnnxEmbeddingModel class, passing the model and tokenizer files location, and pooling mode to MEAN:

String pathToModel = "./target/model/model.onnx";
String pathToTokenizer = "./target/model/tokenizer.json"; 

PoolingMode poolingMode = PoolingMode.MEAN;

EmbeddingModel embeddingModel = new OnnxEmbeddingModel(pathToModel, pathToTokenizer, poolingMode);

Now, we can use the embedingModel instance to call the embed method passing a word (cat in this example) to generate an embedding from a text, and return an Embedding instance representing the vector embedding:

Response<Embedding> response = embeddingModel.embed("cat");
Embedding embedding = response.content();

float[] f =  embedding.vector();

You can repeat the same code not only for cat, but also for car and feline. And you'll get three vectors representing semantically the three words.

To know how close each vector is, or said in other words, how close semantically cat, feline, and car are, we need to calculate the distance between these vectors. There are multiple mathematical formulas to calculate the distance between vectors, but for this example, we'll use the cosine distance method since it is one of the most used algorithms that fits in most situations.

The following Java method implements the cosine distance:

public static float cosineDistance(float[] a, float[] b) {
    if (a == null || b == null) {
        throw new IllegalArgumentException("Input vectors must not be null.");
 }

    if (a.length != b.length) {
        throw new IllegalArgumentException("Vectors must be of the same length.");
 }

    double dot = 0.0;
    double normA = 0.0;
    double normB = 0.0;

    for (int i = 0; i < a.length; i++) {
        dot += a[i] * b[i];
        normA += a[i] * a[i];
        normB += b[i] * b[i];
    }

    if (normA == 0.0 || normB == 0.0) {
        throw new IllegalArgumentException("Cosine distance is undefined for zero vectors.");
 }

    double cosineSimilarity = dot / (Math.sqrt(normA) * Math.sqrt(normB));
    return (float) (1.0 - cosineSimilarity);
}

If you run cosine similarity with a pair of vectors, you'll get how similar they are in their meaning. For example, and for educational purposes, the vectors are truncated, you get the following:

Cat: [0.005409543, 0.02569751, -0.003617844, -0.015324947, -0.056198075, -0.02924295, 0.03488783, 0.028627029]
Feline: [-0.027505683, -0.015484203, -0.006327568, -0.0051412256, -0.0028547782, -0.09774517, 0.00529569, 1.4912758E-4]

Distance: 0.22474928

Cat: [0.005409543, 0.02569751, -0.003617844, -0.015324947, -0.056198075, -0.02924295, 0.03488783, 0.028627029]
Car: [-0.037123796, 0.047072936, 0.023892863, 0.03124622, 0.00397948, -0.07480293, -0.013632841, -0.016730614]

Distance: 0.39682457

The distance between cat and feline is smaller than that between cat and car, obviously, because a cat does not have much in common with a car, but does have much in common with felines.

Conclusions

This article is a practical introduction on how to calculate embeddings using Java. In this case, we used Granite Embedding models, but you can use any model in ONNX format that is able to produce an embedding.

LangChain4j is an ideal toolkit for integrating Java and AI, and it is suitable not only for embeddings but also for generative AI applications. If you are considering integrating Java and AI, LangChain4j is definitely a project you should take a look at.