Smart Tag uses artificial intelligence to analyze video content and automatically generate multi-dimensional content tags for media asset retrieval, personalized recommendations, and smart ad delivery.-Intelligent Media Services(IMS)-阿里云帮助中心

Smart Tag analyzes video content, including visuals, text, speech, and behavior. The service uses multi-modal information fusion and alignment technology to recognize content with high accuracy and automatically generate multi-dimensional content tags. This process converts unstructured information into structured data and is ideal for media asset retrieval, personalized recommendations, and smart ad delivery.

Feature overview

Scenarios

You can accurately and efficiently retrieve media assets by cataloging media files, such as videos, images, and text, with a rich and accurate tag system. You can then quickly retrieve content from your asset library using keywords or tags. This improves the efficiency and accuracy of resource retrieval.

Personalized recommendation: Deliver precise and personalized content recommendations. You can match content accurately by combining media file tags with user information, behavioral data, and user persona analysis. This provides personalized video recommendations, addresses the recommendation cold-start problem, and improves recommendation accuracy.
Smart ad delivery: Deliver intelligent, scenario-based ads. You can automatically identify ad slots based on multi-modal content analysis and video tags. You can then match these slots with content from your ad resource library to reach your target audience with precision and improve ad conversion rates.

Features

Feature	Description
Video classification and structured tags	Analyzes video content to generate video categories and structured entity tags.
Content tag recognition	Analyzes visual information in videos to recognize content tags for people, objects, and scenes.
Video Optical Character Recognition (OCR) tags	Recognizes and extracts text from video images.
Video Automatic Speech Recognition (ASR) tags	Analyzes speech in videos and extracts the text content.
Video tags	Analyzes content in videos, such as programs, characters, objects, scenes, and locations.
Image tags	Analyzes content in images, such as characters, locations, actions, events, logos, and objects.

Limits

Smart Tag can process the following file types and formats:

Video	Audio	Image
Video formats: AVI, FLV, MKV, MPG, MP4, TS, MOV, MXF	Audio formats: MP3, WAV	Image formats: JPG, JPEG, PNG
Encoding formats: MPEG-2, MPEG-4, H.264, H.265/HEVC	Not applicable	Not applicable
Video duration: ≤ 4 hours	Audio duration: ≤ 4 hours	Not applicable
Video size: ≤ 4 GB	Audio size: ≤ 400 MB	Image size: ≤ 4 MB
Video resolution: 240p to 2160p. For best analysis results, use a resolution of 720p or higher.	Not applicable	Image resolution: ≤ 2160p

Submit a smart tag job using an API

Create a template. You can skip this step if you have already created a template.
Smart Tag jobs use custom templates to specify the analysis type. You can call the CreateCustomTemplate operation to create a custom template. You can call the GetCustomTemplate operation to retrieve template information.
Submit a smart tag job
You can submit a smart tag job by calling an API. For more information about the parameters, see SubmitSmarttagJob.
Retrieve the smart tag job information.
You can query the status and results of a smart tag job by calling an API. You can query by the smart tag job ID. For more information about the parameters, see QuerySmarttagJob.

Billing

For more information about billing, see Smart Tag.