Vectorize text data by using Jina Embeddings model-DashVector(DashVector)-阿里云帮助中心

Use the Jina Embeddings v2 model to convert text into vector embeddings, store them in DashVector, and run semantic similarity searches.

Prerequisites

Before you begin, ensure that you have:

DashVector:
- A DashVector cluster. See Create a cluster.
- A DashVector API key. See Manage API keys.
- The latest DashVector SDK installed. See Install DashVector SDK.
Jina AI:
- A Jina AI API key from jina.ai/embeddings

Jina Embeddings v2 models

Jina Embeddings v2 model is the only open source embedding model that supports a text length of 8,192. The functionality and performance of this model in terms of massive text embedding benchmark (MTEB) rivals the closed-source text-embedding-ada-002 model of OpenAI.

The following Jina Embeddings v2 models are supported. All models accept a maximum input length of 8,192 and use Cosine distance.

Model	Dimensions	Data type
jina-embeddings-v2-small-en	512	Float32
jina-embeddings-v2-base-en	768	Float32
jina-embeddings-v2-base-zh	768	Float32

For the full list of available models and their specifications, see the Jina AI documentation.

How it works

Call the Jina AI embeddings API to convert your text into a vector.
Store the vector in a DashVector collection.
Submit a query vector to retrieve semantically similar results.

Embed text and run a vector search

The following example uses jina-embeddings-v2-base-zh (768 dimensions) to embed text, insert it into DashVector, and run a similarity search.

Replace the following placeholders before running the code:

Placeholder	Description
`{your-dashvector-api-key}`	Your DashVector API key
`{your-dashvector-cluster-endpoint}`	The endpoint of your DashVector cluster
`{your-jina-api-key}`	Your Jina AI API key

from dashvector import Client
import requests
from typing import List


# Embed text using the Jina Embeddings v2 model
def generate_embeddings(texts: List[str]):
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer {your-jina-api-key}'
    }
    data = {'input': texts, 'model': 'jina-embeddings-v2-base-zh'}
    response = requests.post('https://api.jina.ai/v1/embeddings', headers=headers, json=data)
    return [record["embedding"] for record in response.json()["data"]]


# Create a DashVector client
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a collection with 768 dimensions to match the model output
rsp = client.create('jina-text-embedding', 768)
assert rsp
collection = client.get('jina-text-embedding')
assert collection

# Insert a vector into the collection
collection.insert(
    ('ID1', generate_embeddings(['Alibaba Cloud DashVector is one of the best vector databases in performance and cost-effectiveness.'])[0])
)

# Query for similar vectors
docs = collection.query(
    generate_embeddings(['The best vector database'])[0]
)
print(docs)

Prerequisites

Jina Embeddings v2 models

How it works

Embed text and run a vector search

What's next