Convert multiple modalities to vectors by using Alibaba Cloud Refined API calls-DashVector(DashVector)-阿里云帮助中心

This topic describes how to use Baxter Model Service Platform Bailian to generate multimodal vectors and import them to the vector retrieval service DashVector for vector retrieval.

Through the flexible and easy-to-use model API service, Alibaba Cloud makes the capabilities of various modal models easy for AI developers to use. Through the Alibaba Cloud Bailian API, developers can not only directly integrate the powerful capabilities of large models, but also train and fine-tune models to implement model customization.

Prerequisites

DashVector:
- Cluster created: Create a cluster.
- API-KEY: API-KEY Management
- The latest version of the SDK is installed. For more information, see Install the DashVector SDK.
Alibaba Cloud:
- You have activated the service and obtained the API-KEY: Obtain API Key and Configure API Key to Environment Variables (Prepare for unpublished and merged into API Key Configuration).
- The latest version of the SDK is installed Install SDK.

Universal Multimodal Vector

The model generates continuous vectors based on user inputs. These inputs can be text, images, or videos. For more information about file formats, see throttling. It is suitable for tasks such as video classification, image classification, and graphic retrieval.

For more information about limits and core features, see Text and multimodal vectorization.

Example

Note

The following replacement code is required to run properly:

DashVector api-key replace the {your-dashvector-api-key} in the example
DashVector Cluster Endpoint replace the {your-dashvector-cluster-endpoint} in the example
DashScope api-key replace the {your-dashscope-api-key} in the example

Python

import dashscope
from dashvector import Client


dashscope.api_key = '{your-dashscope-api-key}'


# Call the DashScope multimodal-embedding-v1 model to embed various modal materials as vectors.
def generate_embeddings(text: str = None, image: str = None, video: str = None):
    input = []
    if text:
        input.append({'text': text})
    if image:
        input.append({'image': image})
    if video:
        input.append({'video': video})
    result = dashscope.MultiModalEmbedding.call(
        model="multimodal-embedding-v1",
        input=input
    )
    if result.status_code != 200:
        raise Exception(f"multimodal-embedding-v1 failed to generate embedding of {input}, result: {result}")
    return result.output["embeddings"][0]["embedding"]

# Create a DashVector client.
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a DashVector Collection.
rsp = client.create('multimodal-embedding', 1024)
assert rsp
collection = client.get('multimodal-embedding')
assert collection

# Vector input DashVector
collection.insert(
    [
        ('ID1', generate_embeddings(text='Alibaba Cloud vector search service DashVector is one of the vector databases with good performance and cost-effect')),
        ('ID2', generate_embeddings(image='https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png')),
        ('ID3', generate_embeddings(video='https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4')),
        ('ID4', generate_embeddings(
            text='Alibaba Cloud vector search service DashVector is one of the vector databases with good performance and cost-effect',
            image='https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png',
            video='https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4'
        ))
    ]
)
# Vector retrieval
docs = collection.query(
    generate_embeddings(text='The best vector database')
)
print(docs)