Cohere builds large language models and makes them accessible through a set of APIs. Cohere’s embedding models, such as embed-english-v3.0
and embed-multilingual-v3.0
, transform text chunks into vector representations. These models can be accessed through their Embed API. This API features an embedding_types parameter which gives users the option to produce highly compressed embeddings to save on storage costs.
Cohere’s generative models, such as command-r
and command--r-plus
, receive user instructions and generate useful text. These models can be accessed through their Chat API, enabling users to create multi-turn conversational experiences. This API features a documents parameter which allows users to provide the model with their own documents directly in the message; these can be used to ground model outputs.
Cohere’s reranking models, such as rerank-english-v3.0
and rerank-multilingual-v3.0
, improve search results by re-organizing retrieved results based on certain parameters. These models can be accessed through their Rerank API. These models offer a “low lift, last mile” improvement to search algorithms. Together, these models can be used to build state-of-the-art retrieval-augmented generation (RAG) systems - transform your text into embeddings with Embed v3, store them with Elasticsearch, rerank retrieved results for maximum relevancy, and dynamically pass retrieved documents to the Chat API for grounded conversation.