After creating a dataset, you can create a metadata index for files in services such as Object Storage Service (OSS) and Drive and Photo Service (PDS) to efficiently manage and retrieve large volumes of media files.
Prerequisites
You have created a dataset. For more information, see Create a dataset.
Overview
A metadata index enables you to search, filter, and manage large collections of media files by keywords, attributes, or other identifiers.
Procedure
You can automatically index all files in an OSS bucket, or manually index specified files in an OSS bucket or PDS.
Automatically index all files in an OSS bucket
To automatically index all files in an OSS bucket, create a binding between a dataset and the bucket by calling an API or by adding a data source in the IMM console. After the binding is created, Intelligent Media Management (IMM) performs a full scan of existing data in the bucket, extracts file metadata, and indexes it. IMM then monitors the bucket for new files and performs real-time incremental scans to extract and index their metadata.
Important: After a binding is successfully created, IMM starts a scan of the existing or new files in your specified OSS bucket. The more objects your bucket contains, the higher the metadata scanning costs. For more information, see IMM Billing. If you are testing this feature or are unsure of the outcome, we recommend using an OSS bucket with a small number of files and carefully selecting a workflow template to avoid unexpected charges.
API
The following example indexes all files in the test-bucket bucket, storing the index in the test-dataset dataset within the test-project project.
-
Call CreateBinding to create a binding between the dataset and the OSS bucket.
-
Sample request
{ "ProjectName": "test-project", "URI": "oss://test-bucket", "DatasetName": "test-dataset" } -
Sample response
{ "Binding": { "Phase": "", "ProjectName": "test-project", "DatasetName": "test-dataset", "State": "Ready", "CreateTime": "2022-07-06T07:03:28.054762739+08:00", "UpdateTime": "2022-07-06T07:03:28.054762739+08:00", "URI": "oss://test-bucket" }, "RequestId": "090D2AC5-8450-0AA8-A1B1-****" } -
Sample code (Python SDK)
# -*- coding: utf-8 -*- import os from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use an AccessKey ID and AccessKey secret to initialize a client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main() -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) create_binding_request = imm_20200930_models.CreateBindingRequest( # Specify the name of the IMM project. project_name='test-project', # Specify the name of the IMM dataset. dataset_name='test-dataset', # Specify the URI of the OSS bucket to bind. uri='oss://test-bucket' ) runtime = util_models.RuntimeOptions() try: # Print the API response. response = client.create_binding_with_options(create_binding_request, runtime) print(response.body.to_map()) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) print(error) if __name__ == '__main__': Sample.main()
-
-
Optional: Call GetBinding to query the binding status.
-
Sample request
{ "ProjectName": "test-project", "URI": "oss://test-bucket", "DatasetName": "test-dataset" } -
Sample response
{ "Binding": { "Phase": "IncrementalScanning", "ProjectName": "test-project", "DatasetName": "test-dataset", "State": "Running", "CreateTime": "2022-07-06T07:04:05.105182822+08:00", "UpdateTime": "2022-07-06T07:04:13.302084076+08:00", "URI": "oss://test-bucket" }, "RequestId": "B5A9F54B-6C54-03C9-B011-****" }Note-
If the value of the Phase parameter is IncrementalScanning, IMM has finished indexing the existing data in the OSS bucket and is incrementally scanning new files.
-
If the value of the State parameter is Running, the binding is active.
-
-
Sample code (Python SDK 1.27.3)
# -*- coding: utf-8 -*- import os from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use an AccessKey ID and AccessKey secret to initialize a client. @param access_key_id: @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main() -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) get_binding_request = imm_20200930_models.GetBindingRequest( # Specify the name of the IMM project. project_name='test-project', # Specify the name of the IMM dataset. dataset_name='test-dataset', # Specify the URI of the bound OSS bucket. uri='oss://test-bucket' ) runtime = util_models.RuntimeOptions() try: # Print the API response. response = client.get_binding_with_options(get_binding_request, runtime) print(response.body.to_map()) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) print(error) if __name__ == '__main__': Sample.main()
-
Add data source
-
In your project, find the dataset test-dataset.
In the left-side navigation pane, choose Data Management & Indexing > Datasets. You can find test-dataset in the list.
-
Click the dataset test-dataset, choose the Data access tab, and then click New Data Source.
-
Select the bucket that you want to bind and click OK.
NoteAfter you add a data source, IMM first creates metadata extraction tasks for the existing files in the bucket. Then, IMM continuously monitors the data source for events and creates new tasks accordingly. These tasks incur charges. For more information, see Billing overview. We recommend that you test this feature with a bucket that contains a small amount of data.
Manually index specific files
API
To manually index specific files in an OSS bucket or PDS, call BatchIndexFileMeta or IndexFileMeta.
-
Call the BatchIndexFileMeta operation
The following example indexes the OSS files oss://test-bucket/test-object1.jpg and oss://test-bucket/test-object2.jpg in the test-dataset dataset within the test-project project.
-
Sample request
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "People" } }, { "URI": "oss://test-bucket/test-object2.jpg", "CustomLabels": { "category": "Pets" } } ], "Notification": { "MNS": { "TopicName": "test-topic" } } } -
Sample response
{ "RequestId": "0D4CB096-EB44-02D6-A4E9-****", "EventId": "16C-1KoeYbdckkiOObpyzc****" } -
Simple Message Queue (MNS) message. The result is returned in an MNS message. For more information about the MNS SDK, see Step 4: Receive and delete messages.
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "RequestId": "658FFD57-B495-07C0-B24B-B64CC52993CB", "StartTime": "2022-07-06T07:18:18.664770352+08:00", "EndTime": "2022-07-06T07:18:20.762465221+08:00", "Success": true, "Message": "", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "People" }, "Error": "" }, { "URI": "oss://test-bucket/test-object2.jpg", "CustomLabels": { "category": "Pets" }, "Error": "" } ] }Note-
If the Success parameter is true, the metadata indexing task succeeded.
-
The Files array returns the URI and error information for each file. If the Error field is empty, the file's metadata was indexed successfully.
-
-
Sample code (Python SDK)
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import sys import os from typing import List from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use an AccessKey ID and AccessKey secret to initialize a client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_0custom_labels = { 'category': 'People' } input_file_0 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_0custom_labels ) input_file_1custom_labels = { 'category': 'Pets' } input_file_1 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object2.jpg', custom_labels=input_file_1custom_labels ) batch_index_file_meta_request = imm_20200930_models.BatchIndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', files=[ input_file_0, input_file_1 ], notification=notification ) runtime = util_models.RuntimeOptions() try: # Send the request to start the asynchronous indexing task. client.batch_index_file_meta_with_options(batch_index_file_meta_request, runtime) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_0custom_labels = { 'category': 'People' } input_file_0 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_0custom_labels ) input_file_1custom_labels = { 'category': 'Pets' } input_file_1 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object2.jpg', custom_labels=input_file_1custom_labels ) batch_index_file_meta_request = imm_20200930_models.BatchIndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', files=[ input_file_0, input_file_1 ], notification=notification ) runtime = util_models.RuntimeOptions() try: # When you run the code, print the API response. await client.batch_index_file_meta_with_options_async(batch_index_file_meta_request, runtime) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
-
-
Call the IndexFileMeta operation
The following example indexes the OSS file oss://test-bucket/test-object1.jpg in the test-dataset dataset within the test-project project.
-
Sample request
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "File": { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "People" } }, "Notification": { "MNS": { "TopicName": "test-topic" } } } -
Sample response
{ "RequestId": "5AA694AD-3D10-0B6A-85B2-****", "EventId": "17C-1Kofq1mlJxRYF7vAGF****" } -
Simple Message Queue (MNS) message. The result is returned in an MNS message. For more information about the MNS SDK, see Step 4: Receive and delete messages.
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "RequestId": "658FFD57-B495-07C0-B24B-B64CC52993CB", "StartTime": "2022-07-06T07:18:18.664770352+08:00", "EndTime": "2022-07-06T07:18:20.762465221+08:00", "Success": true, "Message": "", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "People" }, "Error": "" } ] }Note-
If the Success parameter is true, the metadata indexing task succeeded.
-
The Files array returns the URI and error information for the file. If the Error field is empty, the file's metadata was indexed successfully.
-
-
Sample code (Python SDK)
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import sys import os from typing import List from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use an AccessKey ID and AccessKey secret to initialize a client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_custom_labels = { 'category': 'People' } input_file = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_custom_labels ) index_file_meta_request = imm_20200930_models.IndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', file=input_file, notification=notification ) runtime = util_models.RuntimeOptions() try: # When you run the code, print the API response. client.index_file_meta_with_options(index_file_meta_request, runtime) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account grants full access to all APIs. For better security, we recommend that you use a RAM user for API calls and daily operations. # To prevent security risks, do not hard-code your AccessKey ID and AccessKey secret in your project code. # This example shows how to read the AccessKey pair from environment variables for authentication. For more information about how to configure environment variables, see https://help.aliyun.com/document_detail/2361894.html. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_custom_labels = { 'category': 'People' } input_file = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_custom_labels ) index_file_meta_request = imm_20200930_models.IndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', file=input_file, notification=notification ) runtime = util_models.RuntimeOptions() try: # When you run the code, print the API response. await client.index_file_meta_with_options_async(index_file_meta_request, runtime) except Exception as error: # If an error occurs, print the error message. UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
-
Batch add
You can add multiple files from an OSS bucket to a dataset in a single batch operation. IMM then runs an asynchronous workflow to extract and index metadata for these new files. This workflow supports message notifications, so when adding the files, specify an MNS topic to receive the task results. For more information, see Asynchronous notification message format.
-
In your project, find the dataset test-dataset.
-
Click the dataset test-dataset, choose the Data access tab, and then click Batch Add.
-
In the Add File to Dataset panel, enter the name of the Simple Message Queue (MNS) topic for receiving results, and then click Select File to add the files that you want to index.
After the files are added, you can view the added OSS file records in the list. Click Edit Labels to add custom labels to a file, or click Delete to remove a file.