Use the Jina Embeddings v2 model to convert text into vector embeddings, store them in DashVector, and run semantic similarity searches.
Prerequisites
Before you begin, ensure that you have:
DashVector:
A DashVector cluster. See Create a cluster.
A DashVector API key. See Manage API keys.
The latest DashVector SDK installed. See Install DashVector SDK.
Jina AI:
A Jina AI API key from jina.ai/embeddings
Jina Embeddings v2 models
Jina Embeddings v2 model is the only open source embedding model that supports a text length of 8,192. The functionality and performance of this model in terms of massive text embedding benchmark (MTEB) rivals the closed-source text-embedding-ada-002 model of OpenAI.
The following Jina Embeddings v2 models are supported. All models accept a maximum input length of 8,192 and use Cosine distance.
| Model | Dimensions | Data type |
|---|---|---|
| jina-embeddings-v2-small-en | 512 | Float32 |
| jina-embeddings-v2-base-en | 768 | Float32 |
| jina-embeddings-v2-base-zh | 768 | Float32 |
For the full list of available models and their specifications, see the Jina AI documentation.
How it works
Call the Jina AI embeddings API to convert your text into a vector.
Store the vector in a DashVector collection.
Submit a query vector to retrieve semantically similar results.
Embed text and run a vector search
The following example uses jina-embeddings-v2-base-zh (768 dimensions) to embed text, insert it into DashVector, and run a similarity search.
Replace the following placeholders before running the code:
| Placeholder | Description |
|---|---|
{your-dashvector-api-key} | Your DashVector API key |
{your-dashvector-cluster-endpoint} | The endpoint of your DashVector cluster |
{your-jina-api-key} | Your Jina AI API key |
from dashvector import Client
import requests
from typing import List
# Embed text using the Jina Embeddings v2 model
def generate_embeddings(texts: List[str]):
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer {your-jina-api-key}'
}
data = {'input': texts, 'model': 'jina-embeddings-v2-base-zh'}
response = requests.post('https://api.jina.ai/v1/embeddings', headers=headers, json=data)
return [record["embedding"] for record in response.json()["data"]]
# Create a DashVector client
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Create a collection with 768 dimensions to match the model output
rsp = client.create('jina-text-embedding', 768)
assert rsp
collection = client.get('jina-text-embedding')
assert collection
# Insert a vector into the collection
collection.insert(
('ID1', generate_embeddings(['Alibaba Cloud DashVector is one of the best vector databases in performance and cost-effectiveness.'])[0])
)
# Query for similar vectors
docs = collection.query(
generate_embeddings(['The best vector database'])[0]
)
print(docs)