Vectorize image data by using open source embedding models of ModelScope

更新时间:
复制 MD 格式

ModelScope is an open source model-as-a-service (MaaS) platform that provides pre-trained models across a wide range of AI tasks.

On ModelScope, you can:

  • Use and download pre-trained models free of charge.

  • Perform command line-based model prediction to validate model effects simply and quickly.

  • Fine-tune models with your own data for customization.

  • Engage in theoretical and practical training to effectively improve your R&D abilities.

  • Share your ideas with the entire community.

Prerequisites

Before you begin, make sure you have:

Similarity search model based on product characteristics

Model overview

This model is designed for large-scale product image similarity search, particularly for luggage items. It automatically performs image matting on luggage products and extracts product characteristics from the matted result — no additional preprocessing input is required.

Note: This model is optimized for luggage product images. Results on other product categories may vary.

Use the following parameters when creating your DashVector collection:

Model IDDimensionsDistance metricData type
damo/cv_resnet50_product-bag-embedding-models512CosineFloat32

For more information about this model, see Similarity search model based on product characteristics extracted from product images.

Example

The following example shows how to generate an embedding from a product image URL, insert it into a DashVector collection, and run a similarity query.

Replace the following placeholders before running the code:

PlaceholderDescription
{your-dashvector-api-key}Your DashVector API key
{your-dashvector-cluster-endpoint}The endpoint of your DashVector cluster
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from dashvector import Client


# Load the product embedding model
product_embedding = pipeline(
    Tasks.product_retrieval_embedding,
    model='damo/cv_resnet50_product-bag-embedding-models'
)


def generate_embeddings(img: str):
    result = product_embedding(img)
    return result['img_embedding']


# Create a DashVector client
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a collection with 512 dimensions to match the model output
rsp = client.create('resnet50-embedding', dimension=512)
assert rsp
collection = client.get('resnet50-embedding')
assert collection

# Generate an embedding from an image URL and insert it into the collection
img_url = 'https://mmsearch.oss-cn-zhangjiakou.aliyuncs.com/maas_test_img/tb_image_share_1666002161794.jpg'
collection.insert(
    ('ID1', generate_embeddings(img_url))
)

# Query for similar images
docs = collection.query(
    generate_embeddings(img_url)
)
print(docs)

What's next