Extract video captions

更新时间:
复制 MD 格式

Use the media transcoding API of Intelligent Media Management (IMM) to extract text-based and graphical captions from video files.

Overview

Video caption extraction separates caption data from video files, letting you access and edit text content independently. This supports multilingual subtitles, accessibility, and caption creation.

006

Use cases

  • Multilingual support: Create subtitles in multiple languages to make video content accessible to diverse audiences.

  • Translation and localization: Extract text from videos for translation into localized versions that match regional language and cultural norms.

  • Dubbing and speech recognition: Provide accurate scripts for voice-over work and training data for speech recognition systems.

  • Video editing and production: Extract and organize captions for easier editing, proofreading, and refinement.

Limits

Only text-based captions and graphical captions are supported. Captions embedded directly into the video stream as part of the image are not supported. For other requirements, contact us.

Procedure

Prerequisites

  • An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.

  • Object Storage Service (OSS) is activated and a bucket is created. For more information, see Create a bucket.

  • IMM is activated. For more information, see Activate IMM.

  • A project is created in the IMM console. For more information, see Create a project.

    Note
    • You can call the CreateProject operation to create a project. For more information, see CreateProject.

    • You can call the ListProjects operation to list information of all projects in a region.

Step 1: Upload a file

Upload your media file to an OSS bucket in the same region as your IMM project.

On the upload page, keep File ACL set to Inherit Bucket (default), click Scan File to select your file, then click Upload File.

Step 2: Extract captions

Call the CreateMediaConvertTask operation to create a video caption extraction task.

The extraction process includes these steps:

  1. Format detection: Identify caption formats in the video file, such as SRT, ASS, WebVTT, and embedded subtitle streams.

  2. Data extraction: Extract caption text and timestamps from the video file, including speaker names, time markers, and formatting.

  3. Text processing: Edit the extracted captions as needed—remove redundant information, reformat, translate, or correct errors.

  4. Output: Save the processed captions in a specified format (such as .srt or .ass) for later use or upload to a video platform.

Parameter examples

The following examples use IMM project test-project and file oss://test-bucket/video-demo/test.mp4 to extract video captions.

Caption extraction is part of the Media transcoding capabilities.

Note
  • Use OpenAPI Explorer to call the media transcoding API and view SDK code examples.

  • Do not set Target.URI or Target.Container when extracting captions.

  • Include the {streamindex} variable in the caption output URI, for example "oss://test-bucket/objectPrefix-{streamindex}.{autoext}". Without it, captions may overwrite each other.

Extract all captions from a video as WebVTT captions

  • Caption format: webvtt

  • Output caption file path: oss://test-bucket/video-demo/subtitle-%d.vtt

  • Completion notification: MNS topic "test-mns-topic"

Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "webvtt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Extract all captions from a video as SRT captions

  • Caption format: srt

  • Output caption file path: oss://test-bucket/video-demo/subtitle-%d.srt

  • Completion notification: MNS topic "test-mns-topic"

Try this example in OpenAPI Explorer with pre-filled parameters. Modify them as needed.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "srt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Billing

Video caption extraction incurs both OSS and IMM charges. Pricing details: IMM billing items.

  • OSS billing: OSS Pricing

    API

    Billing item

    Description

    GetObject

    GET requests

    Charged based on the number of successful requests.

    Infrequent Access data retrieval capacity

    Charged for the volume of retrieved Infrequent Access data.

    Data retrieval quota for real-time access of Archive objects

    Charged for the volume of Archive data retrieved through real-time access.

    Transfer acceleration

    Charged by data volume when accessing the bucket through an acceleration endpoint.

    PutObject

    PUT requests

    Charged based on the number of successful requests.

    Storage fee

    Charged based on the storage class, size, and duration of object storage.

    HeadObject

    GET requests

    Charged based on the number of successful requests.

  • IMM billing: IMM billing items.

    Important

    Starting at 11:00 UTC+8 on July 28, 2025, IMM video caption extraction changes from free to paid. IMM billing adjustment announcement .

    API

    Billing item

    Description

    CreateMediaConvertTask

    ExtractSubtitleText

    Charged based on the number of successfully extracted text-based caption streams.

    CreateMediaConvertTask

    ExtractSubtitleImage

    Charged based on the total duration of successfully extracted graphical caption streams.