Multi-track MP4 transcoding and language tagging

更新时间:
复制 MD 格式

Transcode an MP4 file with Intelligent Media Services (IMS) to include multiple audio tracks and assign a language to each track.

Workflow

image

Example of the output file structure:

Duration: 00:00:31.40, start: 0.000000, bitrate: 816 kb/s
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 960x540 [SAR 1:1 DAR 16:9], 663 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Stream #0:1[0x2](zho): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:2[0x3](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:3[0x4](jpn): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)

Prerequisites

You have activated Intelligent Media Services (IMS). For more information, see Activate the service.

Preparations

Basic IMS configuration

  • Storage configuration: Bind an Object Storage Service (OSS) bucket to IMS. For more information, see Configure a storage address.

  • Callback configuration: Configure an HTTP or Message Notification Service (MNS) callback to receive job status notifications. For more information about callback methods and events, see Callback events overview.

Transcoding templates

Procedure

image

Example requirements

Codec: H.264/H.265

Video resolution: 360p/540p/720p/1080p

Audio: HE-AAC at 64 kbps (default configuration).

Configuration example

Create transcoding templates for four video resolutions as described below. For instructions, see Create a transcoding template.

Note

To use Narrowband HD™ transcoding, create a corresponding template based on the tables. Then, submit a ticket to request a backend configuration upgrade from Alibaba Cloud.

H.264

Transcoding template

Codec

Container format

Other parameters

Video-360P

H.264

mp4

  • Resolution (long edge fixed): 640*

  • Configure other parameters as needed.

Video-540P

H.264

mp4

  • Resolution (long edge fixed): 960*

  • Configure other parameters as needed.

Video-720P

H.264

mp4

  • Resolution (long edge fixed): 1280*

  • Configure other parameters as needed.

Video-1080P

H.264

mp4

  • Resolution (long edge fixed): 1920*

  • Configure other parameters as needed.

H.265

Transcoding template

Codec

Container format

Other parameters

Video-360P

H.265

mp4

  • Resolution (long edge fixed): 640*

  • Configure other parameters as needed.

Video-540P

H.265

mp4

  • Resolution (long edge fixed): 960*

  • Configure other parameters as needed.

Video-720P

H.265

mp4

  • Resolution (long edge fixed): 1280*

  • Configure other parameters as needed.

Video-1080P

H.265

mp4

  • Resolution (long edge fixed): 1920*

  • Configure other parameters as needed.

Submit a transcoding job

Call the SubmitMediaConvertJob API operation to submit a transcoding job.

Audio parameters

Parameter

Type

Description

InputRef

String

The name of the input stream for this audio track. This must match a Name defined in the Inputs or AudioSelector array.

LanguageControl

String

Controls how the language tag is set for the output stream. Valid values:

  • InputFirst: Uses the language tag from the input stream. If the input stream has no language tag, the service uses the tag specified in the Language parameter.

  • Configured: Uses the language tag specified in the Language parameter.

  • None: Does not add a language tag. This is the default value.

Language

String

The ISO 639-2 language code for the audio track.

Remove

String

Whether to remove the audio stream.

Codec

String

The audio codec.

Profile

String

The audio encoding profile.

Bitrate

String

The bitrate of the output audio.

Samplerate

String

The sample rate.

Channels

String

The number of audio channels.

Volume

Object

The volume control settings.

Scenario 1: Keep original audio

Note
  1. The Inputs array specifies three sources: a main video file with default audio (video), a separate English audio file (EnglishAudio), and a separate Japanese audio file (JapaneseAudio).

  2. In OutputGroups.GroupConfig, "Type": "File" specifies the output as a single container file.

  3. Each track uses InputRef to specify its source input and LanguageControl to determine its language tagging logic.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<video-with-default-audio.mp4>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<english-audio.mp4>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<japanese-audio.mp4>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<your-bucket>.<public-endpoint>/<output-path>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 2: Remove original audio

Note

This configuration is similar to Scenario 1, but it omits the reference to the original video's audio track from the Audios array. As a result, the output file contains only the English and Japanese audio tracks.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<video-with-default-audio.mp4>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<english-audio.mp4>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<japanese-audio.mp4>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<your-bucket>.<public-endpoint>/<output-path>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 3: Select audio by language

This example uses the AudioSelector parameter to select the audio track tagged jpn from the JapaneseFile input. The output audio track sets LanguageControl to InputFirst, which inherits the language tag from the input.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<video-with-default-audio.mp4>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<english-audio.mp4>"}
    },
    {
      "Name": "JapaneseFile",
      "InputFile": {"Type": "OSS", "Media": "https://<your-bucket>.<public-endpoint>/<multilingual-file.mp4>"},
      "AudioSelector": [{
        "Name": "JapaneseAudio",
        "Rule": "tag",
        "TagConfig": {"language": "jpn"}
      }]
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<your-bucket>.<public-endpoint>/<output-path>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "InputFirst"
              }
            ]
          }
        }
      ]
    }
  ]
}

Query the transcoding job

Call the GetMediaConvertJob API operation to retrieve the details of a transcoding job.

Callback event

Event type: MediaConvertComplete

This event is not configurable in the console. Configure it by calling the SetEventCallback API operation.

Key callback parameters

Parameter

Type

Required

Description

Name

String

Yes

The name of the parent job.

JobId

String

Yes

The job ID.

Status

String

Yes

The job status. A value of Success indicates the job completed successfully. The parent job is considered successful if at least one sub-job succeeds.

TriggerSource

String

No

The trigger source. API indicates that the job was submitted through an API call.

FinishTime

String

No

The completion time in UTC format: yyyy-MM-ddTHH:mm:ssZ.

UserData

String

No

Custom data specified when submitting the job, passed through and returned in the callback.

Example

{
	"FinishTime": "2025-05-09T08:03:21Z",
	"JobId": "5d37357cb3a44d10ba33c52760c896cd",
	"Status": "Success",
	"TriggerSource": "IceWorkflow",
	"UserData": "{\"ImsSrc\":\"Workflow\",\"TaskId\":\"e89a955d88ca47f0b9b79c562e5c622f\"}"
}