This topic describes how to use Alibaba Cloud Model Studio to transform text into vectors and store them in Vector Retrieval Service (DashVector) for retrieval.
Prerequisites
DashVector:
A cluster is created. For more information, see Create a cluster.
You have obtained an API key. For more information, see Manage API keys.
You have installed the latest version of the SDK. For more information, see Install the DashVector SDK.
Model Studio:
You have activated the service and obtained an API key. For more information, see Get an API key and Configure the API key as an environment variable.
You have installed the latest version of the SDK. For more information, see Install the SDK.
General-purpose text embedding
Introduction
General-purpose text embedding is a unified multilingual text embedding model from Qwen Lab. It is built on a Large Language Model (LLM) and supports major languages worldwide. This model allows developers to quickly transform text into high-quality vector data.
Model name | Vector dimensions | Distance measure | Vector data type | Notes |
text-embedding-v1 | 1536 | Cosine | Float32 |
|
text-embedding-v2 | 1536 | Cosine | Float32 |
|
For more information about general-purpose text embedding, see General-purpose text embedding.
Example
To run the code, replace the following placeholders:
Replace {your-dashvector-api-key} with your DashVector API key.
Replace {your-dashvector-cluster-endpoint} with your DashVector cluster endpoint.
Replace {your-dashscope-api-key} with your DashScope API key.
import dashscope
from dashscope import TextEmbedding
from dashvector import Client
from typing import List, Union
dashscope.api_key = '{your-dashscope-api-key}'
# Call the DashScope general-purpose text embedding model to transform text into vectors
def generate_embeddings(texts: Union[List[str], str], text_type: str = 'document'):
rsp = TextEmbedding.call(
model=TextEmbedding.Models.text_embedding_v2,
input=texts,
text_type=text_type
)
embeddings = [record['embedding'] for record in rsp.output['embeddings']]
return embeddings if isinstance(texts, list) else embeddings[0]
# Create a DashVector client
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Create a DashVector collection
rsp = client.create('dashscope-text-embedding', 1536)
assert rsp
collection = client.get('dashscope-text-embedding')
assert collection
# Insert vectors into DashVector
collection.insert(
('ID1', generate_embeddings('Alibaba Cloud Vector Retrieval Service (DashVector) is one of the best-performing and most cost-effective vector databases'))
)
# Retrieve vectors
docs = collection.query(
generate_embeddings('The best vector database', 'query')
)
print(docs)
Related best practices
Implement semantic search using Vector Retrieval Service (DashVector) and TextEmbedding
DashVector and Qwen LLM: Build a Q&A service based on private knowledge
ONE-PEACE multimodal vector representation
ONE-PEACE is a general-purpose representation model for three modalities: image, text, and audio. It also transforms text into vectors.
For more information, see Generate vectors from multiple modalities — ONE-PEACE multimodal vector representation.