Production parameters, advanced configurations, and SDK examples for Image-Text Matching in common scenarios.
-
Both Script-to-Video and Image-Text Matching use the SubmitBatchMediaProducingJob API to submit a task. To differentiate between them based on parameters, see Parameter differences.
-
In this API, the region specified in the OSS URL of all media assets must be the same as the OpenAPI service endpoint.
-
Supported regions: China (Shanghai), China (Beijing), China (Hangzhou), China (Shenzhen), US (Silicon Valley), and Singapore.
-
Replace all placeholders in the examples ([your-bucket], [your-region-id], [your-file-name], [your-file-path], and media asset IDs) with your actual values.
-
For a better understanding of this document, first reading the Batch video production guide to familiarize yourself with the concepts and workflow of Image-Text Matching common scenarios.
-
Image-Text Matching provides two video generation modes:
-
Global Scripts
-
Storyboard Script
-
API reference
-
To submit a batch video production job that intelligently mixes multiple video, audio, and image assets, see SubmitBatchMediaProducingJob. Key API parameters are detailed in the
InputConfig,EditingConfig, andOutputConfigsections below. -
To get detailed information about a batch video creation job, see GetBatchMediaProducingJob.
InputConfig
InputConfig specifies parameters for video clips, voiceovers, background music, and stickers.
|
Parameter |
Type |
Description |
Example |
Required |
Supported modes |
|
MediaArray |
List<String> |
|
["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] |
Either MediaArray or MediaSearchInput is required |
Both |
|
MediaSearchInput |
Intelligently searches for matching assets by specifying a search library and descriptive text. |
{"LibSearchCondition":{"SearchLibs":["ims-default-search-lib","test-20"],"SearchText":"Alibaba Cloud assistant is learning how to livestream"}} |
Both |
||
|
TitleArray |
List<String> |
An array of titles. One title is randomly selected for each production. Max 50 titles, each up to 50 characters long. |
["Title 1","Title 2"] |
No |
Both |
|
SubHeadingArray |
List<SubHeading> |
Multi-level subheading settings. |
[{"Level":1,"TitleArray":["Level 1 subtitle 1","Level 1 subtitle 2"]},{"Level":3,"TitleArray":["Level 3 subtitle"]}] |
No |
Both |
|
SpeechTextArray |
List<String> |
|
["Voiceover content 1","Voiceover content 2"] |
No |
Global Scripts |
|
SceneInfo |
Scene configuration parameters. |
Yes |
Storyboard Script |
||
|
StickerArray |
List<Sticker> |
|
[{"MediaId":"****9d46c8b4548681030f6e****","X":10,"Y":100,"Width":300,"Height":300,"Opacity":0.6}] |
No |
Both |
|
BackgroundMusicArray |
List<String> |
|
["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] |
No |
Both |
|
BackgroundImageArray |
List<String> |
|
["****b4549d46c88681030f6e****","****549d46c88b4681030f6e****"] |
No |
Both |
MediaSearchInput
|
Parameter |
Type |
Description |
Required |
|
LibSearchCondition |
Configuration for search library conditions. |
Required |
LibSearchCondition
|
Parameter |
Type |
Description |
Example |
Required |
|
SearchLibs |
List<String> |
A list of search libraries. |
["ims-default-search-lib"] |
Yes |
|
SearchText |
String |
Descriptive text for matching assets. Max 20 characters. |
Ocean, coral reef, seals, dolphins, marine environment |
Yes |
SceneInfo
|
Parameter |
Type |
Description |
Required |
|
Scene |
String |
The matching scene type. For common scenarios, set this to |
Yes |
|
ShotInfo |
Configuration for the storyboard. Note
This parameter applies only to Storyboard Script mode. |
No |
ShotInfo
This parameter applies only to Storyboard Script mode.
|
Parameter |
Type |
Description |
Required |
|
ShotScripts |
List<ShotScript> |
An array of storyboard scripts. |
Yes |
ShotScript
This parameter applies only to Storyboard Script mode.
|
Parameter |
Type |
Description |
Example |
Required |
|
ScriptText |
String |
The script text for a single scene, used to describe the scene's content for visual matching. |
He is recently developing a new magic potion. |
No |
|
SpeechText |
String |
|
The old magician Danny is fiddling with strange instruments; he is recently developing a new magic potion. |
No |
|
Duration |
Float |
|
5 |
No |
|
Volume |
Float |
|
0.5 |
No |
Example: Global Scripts mode
{
// Choose either MediaArray or MediaSearchInput
"MediaArray": [
"****9d46c886b45481030f6e****",
"****c886810b4549d4630f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.mp4",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test2.png"
],
// Choose either MediaArray or MediaSearchInput
"MediaSearchInput": {
"LibSearchCondition": {
"SearchLibs": [
"ims-default-search-lib",
"test-20"
],
"SearchText": "Alibaba Cloud assistant is learning how to livestream"
}
},
"TitleArray": [
"Freshippo opens a new location in Huilongguan",
"A new Freshippo store opens"
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Subtitle 1", "Subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level 3 subtitle"]
}
],
"SpeechTextArray": [
"A new Freshippo store just opened in the nearby mall. It's the grand opening today, so I rushed over to check it out. The store isn't huge, but it's packed with people. Snacks and drinks are pretty cheap, and the checkout lines are super long. Come and see for yourself!",
"A new Freshippo store just opened in the nearby mall. It's the grand opening today, so I rushed over to check it out.",
"<speak>Today, our hero, table tennis legend <phoneme alphabet="ipa" ph="mɑː lʊŋ">Ma Long</phoneme>, is striving for the pinnacle of glory.</speak>"
],
"Sticker": {
"MediaId": "****b681034549d46c880f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300,
"Opacity": 0.6
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.png"
]
}
Example: Storyboard Script mode
{
// Choose either MediaArray or MediaSearchInput
"MediaArray": ["****9d46c886b45481030f6e****", "****c886810b4549d4630f6e****"],
// Choose either MediaArray or MediaSearchInput
"MediaSearchInput": {
"LibSearchCondition": {
"SearchLibs": [
"ims-default-search-lib",
"test-20"
],
"SearchText": "Alibaba Cloud assistant is learning how to livestream"
}
},
"SceneInfo": {
"Scene": "General", // General matching
"ShotInfo": {
"ShotScripts": [
{
"ScriptText": "This is the visual script for the first scene",
"SpeechText": "This is the voiceover for the first scene. The scene's duration will match the voiceover length."
},
{
"ScriptText": "This is the visual script for the second scene. With no voiceover, you can set a custom duration.",
"Duration": 5.0, // Can be set when there's no voiceover script.
"Volume": 1.0 // Set the volume of video materials.
},
{
"ScriptText": "This is the visual script for the third scene.",
"SpeechText": "<speak>Voiceover supports SSML. The battle is <phoneme alphabet=\"py\" ph=\"zheng4 hao3\">fierce</phoneme>. Today, our hero, table tennis legend Ma Long, is striving for the pinnacle of glory. <s>In the quarter-finals against the formidable Togami Shunsuke, Ma Long showed no fear, giving his all in every rally.</s> His precise shots and calm judgment gave him the upper hand. In the end, Ma Long successfully defeated his opponent to advance to the semi-finals.<break time=\"1000ms\"/></speak>"
}
]
}
},
"TitleArray": [
"Freshippo opens a new location in Huilongguan",
"A new Freshippo store opens"
],
"SubHeadingArray": [
{
"Level": 1,
"TitleArray": ["Subtitle 1", "Subtitle 2"]
},
{
"Level": 3,
"TitleArray": ["Level 3 subtitle"]
}
],
"StickerArray": [
{
"MediaId": "****9d46c8b4548681030f6e****",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
},
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/test3.png",
"X": 10,
"Y": 100,
"Width": 300,
"Height": 300
}
],
"BackgroundMusicArray": [
"****b4549d46c88681030f6e****",
"****549d46c88b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test4.mp3"
],
"BackgroundImageArray": [
"****6c886b4549d481030f6e****",
"****9d46c8548b4681030f6e****",
"http://[your-bucket].oss-[your-region-id].aliyuncs.com/test1.png"
]
}
EditingConfig
EditingConfig controls titles, volume, positioning, and other production settings. Leave empty to use defaults.
Parameters are the same for both generation modes.
|
Parameter |
Type |
Description |
Example |
Required |
|
JSON |
Configuration for input video assets. |
No |
||
|
JSON |
Configuration for titles. |
No |
||
|
SubHeadingConfig |
JSON |
Configuration for multi-level subtitles. JSON fields:
|
No |
|
|
JSON |
Configuration for the voiceover. |
No |
||
|
JSON |
Configuration for background music. |
{"Volume":0.2} |
No |
|
|
JSON |
Background image configuration. Has no effect if a background image is specified in InputConfig. |
{"SubType":"Blur","Radius":0.5} |
No |
|
|
JSON |
Configuration for the mixing and editing process. |
No |
||
|
JSON |
Canvas configuration for front-end preview. |
{"Width": 1080,"Height": 1920} |
No |
|
|
ProduceConfig |
JSON |
Standard editing and production configuration. For fields, see EditingProduceConfig. |
{"AutoRegisterInputVodMedia":true,"OutputWebmTransparentChannel":true,"CoverConfig":{"StartTime":3.3},"AudioChannelCopy":"left","PipelineId":"***d54a97cff4108b555b01166d4b***","MaxBitrate":5000,"KeepOriginMaxBitrate":false,"KeepOriginVideoMaxFps":false} |
No |
ProcessConfig
|
Parameter |
Type |
Description |
Example |
Required |
|
SingleShotDuration |
Float |
Duration of each segmented shot (seconds) when long video assets are split. |
5 |
No. Default value: 3. |
|
EnableClipSplit |
Boolean |
Enables AI clip segmentation (splits long assets by scene changes). If true, SingleShotDuration is ignored. |
false |
No. Default value: false. |
|
AllowVfxEffect |
Boolean |
Whether to add special effects. |
true |
No. Default value: false. |
|
VfxEffectProbability |
Float |
Probability of applying an effect to each clip. Range: 0.0 to 1.0. Supports 2 decimal places. |
0.6 |
No. Default value: 0.5. |
|
VfxFirstClipEffectList |
List<String> |
|
["slightshow","starfieldshinee"] |
No |
|
VfxNotFirstClipEffectList |
List<String> |
|
["zoomslight","zoom"] |
No |
|
AllowTransition |
Boolean |
Whether to add transition effects. |
true |
No. Default value: false. |
|
TransitionDuration |
Float |
Duration of transitions in seconds. If |
0.5 |
No. Default value: 0.5. |
|
TransitionList |
List<String> |
A list of custom transitions. If |
["directional", "linearblur"] |
No |
|
UseUniformTransition |
Boolean |
Whether to use a uniform transition throughout a single video. |
true |
No. Default value: true. |
|
AllowFilter |
Boolean |
Whether to add custom filters. |
false |
No. Default value: false. |
|
FilterList |
List<String> |
A list of custom filters. If |
["m1", "m2"] |
No |
|
AllowDuplicateMatch |
Boolean |
Whether a matched clip can be reused. |
false |
No. Default value: false. |
|
ImageDuration |
Float |
The duration for static image assets, in seconds. |
2 |
No. Default value: 2. |
Example
All EditingConfig parameters are optional. Default configuration:
{
"MediaConfig": {
"Volume": 0 // Input video assets are muted by default
},
"TitleConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 95 ExtraBold",
"SizeRequestType": "Nominal",
"Y": 0.1, // Y-coordinate for portrait video
"Y": 0.05, // Y-coordinate for landscape video
"Y": 0.08 // Y-coordinate for square video
},
"SpeechConfig": {
"Volume": 1, // Voiceover uses original volume by default
"SpeechRate": 0,
"Voice": null,
"Style": null,
"CustomizedVoice": null, // Voice ID. If set, Voice and Style are ignored.
"AsrConfig": {
"Alignment": "TopCenter",
"AdaptMode": "AutoWrap",
"Font": "Alibaba PuHuiTi 2.0 65 Medium",
"SizeRequestType": "Nominal",
"Spacing": -1,
"Y": 0.8, // Subtitle Y-coordinate for portrait video
"Y": 0.9, // Subtitle Y-coordinate for landscape video
"Y": 0.85 // Subtitle Y-coordinate for square video
}
},
"SubHeadingConfig": {
"1": {
"Y": 0.3,
"FontSize": 40
},
"3": {
"Y": 0.5,
"FontSize": 30
}
},
"BackgroundMusicConfig": {
"Volume": 0.2, // Background music at 20% volume by default
"Style": null
},
"ProcessConfig": {
"SingleShotDuration": 3, // Duration of segmented shots. Choose one: SingleShotDuration or EnableClipSplit.
"EnableClipSplit": false, // Whether to use AI clip segmentation. If true, SingleShotDuration is ignored.
"AllowVfxEffect": false, // Whether to add special effects.
"AllowTransition": false, // Whether to add transitions.
"AllowDuplicateMatch": false // In image-text matching mode, whether to allow reuse of matched clips.
}
}
TemplateConfig
TemplateConfig contains common parameters for batch video production. For detailed parameters and examples, see TemplateConfig.
OutputConfig
-
OutputConfig specifies the output destination, naming, resolution, and video count.
-
Parameters apply to both generation modes.
|
Parameter |
Type |
Description |
Example |
Required |
|
MediaURL |
String |
The output video URL, which must include the |
Format: http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4 Example: http://example.oss-cn-shanghai.aliyuncs.com/example/example_{index}.mp4 |
Required if GeneratePreviewOnly is false and output is to OSS. |
|
StorageLocation |
String |
The storage location for media assets output to ApsaraVideo VOD. |
Format: [your-vod-bucket].oss-[your-region-id].aliyuncs.com Example: outin-****6c886b4549d481030f6e****.oss-cn-shanghai.aliyuncs.com |
Required if GeneratePreviewOnly is false and output is to VOD. |
|
FileName |
String |
The output file name, which must include the |
Format: [your-file-name]__{index}.mp4 Example: example_{index}.mp4 |
Required if GeneratePreviewOnly is false and output is to VOD. |
|
GeneratePreviewOnly |
Boolean |
|
false |
No. Default value: false. |
|
Count |
Integer |
The number of videos to output.
|
10 |
No. Default value: 1. |
|
MaxDuration |
Float |
The maximum duration for each output video, in seconds. If a If no |
20 |
No. Default value: 15. |
|
FixedDuration |
Float |
The fixed duration for each output video. If set, the video duration will be adjusted to match this value. Note:
|
20 |
No. Default value: 15. |
|
Width |
Integer |
The width of the output video in pixels. |
1080 |
Yes |
|
Height |
Integer |
The height of the output video in pixels. |
1920 |
Yes |
|
JSONObject |
Configuration for the output video stream, such as CRF and codec. |
{"Crf": 27} |
No |
Example
{
"MediaURL": "http://[your-bucket].oss-[your-region-id].aliyuncs.com/[your-file-path]/[your-file-name]_{index}.mp4",
"Count": 1,
"MaxDuration": 15,
"Width": 1080,
"Height": 1920,
"Video": {"Crf": 27},
"GeneratePreviewOnly":false
}
SDK examples
Prerequisites
You have installed the IMS server SDK. For more information, see Get started.
Code example
This example uses the Global Scripts mode.
API input parameters
Result examples
|
Portrait |
Landscape |
Editing logic and advanced configuration
Processing logic
Global Scripts mode:
-
If video assets are selected from search library based on descriptive text, the text is used as a search query to intelligently find matching video clips.
-
If a long video is provided as input, it will first be segmented into shorter shots. The final video will be a combination of these shots. The default duration for each shot is 3 seconds, which can be customized using the SingleShotDuration parameter.
-
If no voiceover is provided, the system randomly selects and splices video clips to create a video of approximately 15 seconds.
-
If a voiceover is provided, the system intelligently matches visuals to the text and synchronizes them with the voiceover to produce multiple videos in a batch.
Storyboard Script mode:
-
If video assets are selected from search library based on descriptive text, the text is used to intelligently search for and retrieve matching video clips.
-
In this mode, you do not set
SpeechTextArray. Instead, you control the content, duration, and voiceover for each scene usingSceneInfo.ShotInfo.ShotScripts. -
Within a single scene, the system first tries to match and trim clips based on the
ScriptText. IfScriptTextis not provided butSpeechTextis, the matching is based on the voiceover. -
The duration of a scene is synchronized with either the voiceover length or a custom-defined duration.
Advanced configuration
For advanced settings, see Logic and advanced configurations for batch one-click video creation.
References
-
SubmitBatchMediaProducingJob: submits a batch video production job.
-
GetBatchMediaProducingJob: retrieves details of a batch video production job.
-
SubmitMediaProducingJob: submits a video editing job.