Convert multiple modalities to vectors by using Alibaba Cloud Refined API calls

更新时间:
复制 MD 格式

This topic describes how to use Baxter Model Service Platform Bailian to generate multimodal vectors and import them to the vector retrieval service DashVector for vector retrieval.

Through the flexible and easy-to-use model API service, Alibaba Cloud makes the capabilities of various modal models easy for AI developers to use. Through the Alibaba Cloud Bailian API, developers can not only directly integrate the powerful capabilities of large models, but also train and fine-tune models to implement model customization.

Prerequisites

Universal Multimodal Vector

The model generates continuous vectors based on user inputs. These inputs can be text, images, or videos. For more information about file formats, see throttling. It is suitable for tasks such as video classification, image classification, and graphic retrieval.

For more information about limits and core features, see Text and multimodal vectorization.

Example

Note

The following replacement code is required to run properly:

  1. DashVector api-key replace the {your-dashvector-api-key} in the example

  2. DashVector Cluster Endpoint replace the {your-dashvector-cluster-endpoint} in the example

  3. DashScope api-key replace the {your-dashscope-api-key} in the example

import dashscope
from dashvector import Client


dashscope.api_key = '{your-dashscope-api-key}'


# Call the DashScope multimodal-embedding-v1 model to embed various modal materials as vectors.
def generate_embeddings(text: str = None, image: str = None, video: str = None):
    input = []
    if text:
        input.append({'text': text})
    if image:
        input.append({'image': image})
    if video:
        input.append({'video': video})
    result = dashscope.MultiModalEmbedding.call(
        model="multimodal-embedding-v1",
        input=input
    )
    if result.status_code != 200:
        raise Exception(f"multimodal-embedding-v1 failed to generate embedding of {input}, result: {result}")
    return result.output["embeddings"][0]["embedding"]

# Create a DashVector client.
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a DashVector Collection.
rsp = client.create('multimodal-embedding', 1024)
assert rsp
collection = client.get('multimodal-embedding')
assert collection

# Vector input DashVector
collection.insert(
    [
        ('ID1', generate_embeddings(text='Alibaba Cloud vector search service DashVector is one of the vector databases with good performance and cost-effect')),
        ('ID2', generate_embeddings(image='https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png')),
        ('ID3', generate_embeddings(video='https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4')),
        ('ID4', generate_embeddings(
            text='Alibaba Cloud vector search service DashVector is one of the vector databases with good performance and cost-effect',
            image='https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png',
            video='https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4'
        ))
    ]
)
# Vector retrieval
docs = collection.query(
    generate_embeddings(text='The best vector database')
)
print(docs)