
Larger models have a higher number of dimensions. The endpoint comes with a few model-size options. We do this by calling Cohere’s Embed endpoint, which takes in texts as input and returns embeddings as output. The first thing we need to do is to turn each article's text into embeddings. Here are a few example data points: 1 - which airlines fly from boston to washington dc via other citiesĢ - show me the airlines that fly between toronto and denverģ - show me round trip first class tickets from new york to miamiĤ - i'd like the lowest fare from denver to pittsburghĥ - show me a list of ground transportation at boston airportħ - of all airlines which airline has the most arrivals in atlantaĨ - what ground transportation is available in bostonĩ - i would like your rates between atlanta and boston on september third This dataset consists of inquiries coming to airline travel inquiry systems. Throughout this article, we'll use a subset of the Airline Travel Information System (ATIS) intent classification dataset as an example. There’s no better way to understand text embeddings than by looking at examples. The examples from this article are taken from a Python notebook which you can try out here.

With embeddings, you can compare two or more pieces of text, be it single words, sentences, paragraphs, or even longer documents. Text Embeddings give you the ability to turn unstructured text data into a structured form.

Text generation outputs text, while text representation outputs embeddings These numbers are called text embeddings. When you hear about large language models (LLM), probably the first thing that comes to mind is the text generation capability, such as writing an essay or creating a marketing copy.īut another thing you can get is text representation: a set of numbers that represent what the text means, and somehow capture the semantics of the text.
