Extract video captions

Overview

Video caption extraction separates caption data from video files, letting you access and edit text content independently. This supports multilingual subtitles, accessibility, and caption creation.

006

Use cases

Multilingual support: Create subtitles in multiple languages to make video content accessible to diverse audiences.
Translation and localization: Extract text from videos for translation into localized versions that match regional language and cultural norms.
Dubbing and speech recognition: Provide accurate scripts for voice-over work and training data for speech recognition systems.
Video editing and production: Extract and organize captions for easier editing, proofreading, and refinement.

Limits

Only text-based captions and graphical captions are supported. Captions embedded directly into the video stream as part of the image are not supported. For other requirements, contact us.

Procedure

Prerequisites

An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Create a bucket.
IMM is activated. For more information, see Activate IMM.
A project is created in the IMM console. For more information, see Create a project.
Note
- You can call the CreateProject operation to create a project. For more information, see CreateProject.
- You can call the ListProjects operation to list information of all projects in a region.

Step 1: Upload a file

Upload your media file to an OSS bucket in the same region as your IMM project.

On the upload page, keep File ACL set to Inherit Bucket (default), click Scan File to select your file, then click Upload File.

Step 2: Extract captions

Call the CreateMediaConvertTask operation to create a video caption extraction task.

The extraction process includes these steps:

Format detection: Identify caption formats in the video file, such as SRT, ASS, WebVTT, and embedded subtitle streams.
Data extraction: Extract caption text and timestamps from the video file, including speaker names, time markers, and formatting.
Text processing: Edit the extracted captions as needed—remove redundant information, reformat, translate, or correct errors.
Output: Save the processed captions in a specified format (such as .srt or .ass) for later use or upload to a video platform.

Parameter examples

The following examples use IMM project test-project and file oss://test-bucket/video-demo/test.mp4 to extract video captions.

Caption extraction is part of the Media transcoding capabilities.

Note

Use OpenAPI Explorer to call the media transcoding API and view SDK code examples.
Do not set Target.URI or Target.Container when extracting captions.
Include the {streamindex} variable in the caption output URI, for example "oss://test-bucket/objectPrefix-{streamindex}.{autoext}". Without it, captions may overwrite each other.

Extract all captions from a video as WebVTT captions

Caption format: webvtt
Output caption file path: oss://test-bucket/video-demo/subtitle-%d.vtt
Completion notification: MNS topic "test-mns-topic"

Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "webvtt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Extract all captions from a video as SRT captions

Caption format: srt
Output caption file path: oss://test-bucket/video-demo/subtitle-%d.srt
Completion notification: MNS topic "test-mns-topic"

Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "srt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Billing

Video caption extraction incurs both OSS and IMM charges. Pricing details: IMM billing items.

OSS billing: OSS Pricing

API	Billing item	Description
GetObject	GET requests	Charged based on the number of successful requests.
	Infrequent Access data retrieval capacity	Charged for the volume of retrieved Infrequent Access data.
	Data retrieval quota for real-time access of Archive objects	Charged for the volume of Archive data retrieved through real-time access.
	Transfer acceleration	Charged by data volume when accessing the bucket through an acceleration endpoint.
PutObject	PUT requests	Charged based on the number of successful requests.
PutObject	Storage fee	Charged based on the storage class, size, and duration of object storage.
HeadObject	GET requests	Charged based on the number of successful requests.

IMM billing: IMM billing items.

Important

Starting at 11:00 UTC+8 on July 28, 2025, IMM video caption extraction changes from free to paid. IMM billing adjustment announcement .

API	Billing item	Description
CreateMediaConvertTask	ExtractSubtitleText	Charged based on the number of successfully extracted text-based caption streams.
CreateMediaConvertTask	ExtractSubtitleImage	Charged based on the total duration of successfully extracted graphical caption streams.