Use the media transcoding API of Intelligent Media Management (IMM) to extract text-based and graphical captions from video files.
Overview
Video caption extraction separates caption data from video files, letting you access and edit text content independently. This supports multilingual subtitles, accessibility, and caption creation.

Use cases
-
Multilingual support: Create subtitles in multiple languages to make video content accessible to diverse audiences.
-
Translation and localization: Extract text from videos for translation into localized versions that match regional language and cultural norms.
-
Dubbing and speech recognition: Provide accurate scripts for voice-over work and training data for speech recognition systems.
-
Video editing and production: Extract and organize captions for easier editing, proofreading, and refinement.
Limits
Only text-based captions and graphical captions are supported. Captions embedded directly into the video stream as part of the image are not supported. For other requirements, contact us.
Procedure
Prerequisites
-
An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.
-
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Create a bucket.
-
IMM is activated. For more information, see Activate IMM.
-
A project is created in the IMM console. For more information, see Create a project.
Note-
You can call the CreateProject operation to create a project. For more information, see CreateProject.
-
You can call the ListProjects operation to list information of all projects in a region.
-
Step 1: Upload a file
Upload your media file to an OSS bucket in the same region as your IMM project.
On the upload page, keep File ACL set to Inherit Bucket (default), click Scan File to select your file, then click Upload File.
Step 2: Extract captions
Call the CreateMediaConvertTask operation to create a video caption extraction task.
The extraction process includes these steps:
-
Format detection: Identify caption formats in the video file, such as SRT, ASS, WebVTT, and embedded subtitle streams.
-
Data extraction: Extract caption text and timestamps from the video file, including speaker names, time markers, and formatting.
-
Text processing: Edit the extracted captions as needed—remove redundant information, reformat, translate, or correct errors.
-
Output: Save the processed captions in a specified format (such as .srt or .ass) for later use or upload to a video platform.
Parameter examples
The following examples use IMM project test-project and file oss://test-bucket/video-demo/test.mp4 to extract video captions.
Caption extraction is part of the Media transcoding capabilities.
-
Use OpenAPI Explorer to call the media transcoding API and view SDK code examples.
-
Do not set Target.URI or Target.Container when extracting captions.
-
Include the {streamindex} variable in the caption output URI, for example "oss://test-bucket/objectPrefix-{streamindex}.{autoext}". Without it, captions may overwrite each other.
Extract all captions from a video as WebVTT captions
-
Caption format: webvtt
-
Output caption file path:
oss://test-bucket/video-demo/subtitle-%d.vtt -
Completion notification: MNS topic "test-mns-topic"
Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.
Request parameters
{
"ProjectName": "test-project",
"Notification": {
"MNS": {
"TopicName": "test-mns-topic"
}
},
"Sources": [
{
"URI": "oss://test-bucket/video-demo/test.mp4"
}
],
"Targets": [
{
"Subtitle": {
"ExtractSubtitle": {
"Format": "webvtt",
"URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
}
}
}
]
}Extract all captions from a video as SRT captions
Caption format: srt
Output caption file path:
oss://test-bucket/video-demo/subtitle-%d.srtCompletion notification: MNS topic "test-mns-topic"
Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.
Request parameters
{
"ProjectName": "test-project",
"Notification": {
"MNS": {
"TopicName": "test-mns-topic"
}
},
"Sources": [
{
"URI": "oss://test-bucket/video-demo/test.mp4"
}
],
"Targets": [
{
"Subtitle": {
"ExtractSubtitle": {
"Format": "srt",
"URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
}
}
}
]
}Billing
Video caption extraction incurs both OSS and IMM charges. Pricing details: IMM billing items.
OSS billing: OSS Pricing
API
Billing item
Description
GetObject
GET requests
Charged based on the number of successful requests.
Infrequent Access data retrieval capacity
Charged for the volume of retrieved Infrequent Access data.
Data retrieval quota for real-time access of Archive objects
Charged for the volume of Archive data retrieved through real-time access.
Transfer acceleration
Charged by data volume when accessing the bucket through an acceleration endpoint.
PutObject
PUT requests
Charged based on the number of successful requests.
Storage fee
Charged based on the storage class, size, and duration of object storage.
HeadObject
GET requests
Charged based on the number of successful requests.
IMM billing: IMM billing items.
ImportantStarting at 11:00 UTC+8 on July 28, 2025, IMM video caption extraction changes from free to paid. IMM billing adjustment announcement .
API
Billing item
Description
CreateMediaConvertTask
ExtractSubtitleText
Charged based on the number of successfully extracted text-based caption streams.
CreateMediaConvertTask
ExtractSubtitleImage
Charged based on the total duration of successfully extracted graphical caption streams.