Automatically align materials and the material length

更新时间:
复制 MD 格式

Use ClipId and ReferenceClipId to automatically align audio and video materials across tracks without manually specifying start and end times.

Why use inter-track alignment

In a standard timeline, aligning materials across tracks requires setting TimelineIn and TimelineOut for every clip so they play and end in sync. This is error-prone when the timeline includes dynamic content such as AI-generated speeches with variable lengths.

Inter-track material alignment removes this requirement. Assign a ClipId to a reference clip and set ReferenceClipId on any dependent clip—the system calculates the start time, end time, and duration of the dependent clip automatically. This lets you align audio with video, audio with audio, video with audio, and video with video across different tracks.

How it works

The alignment protocol

Use ClipId to label a clip and ReferenceClipId on another clip to link it to the labeled clip. The linked clip inherits the timeline position and duration of the reference clip.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "In": 0,
          "Out": 5,
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/head.mp4"
        },
        {
          "ReferenceClipId": "audio_1",
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/video1.mp4"
        },
        {
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/end.mp4",
          "In": 0,
          "Out": 5
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "TimelineIn": 5,
          "ClipId": "audio_1",
          "MediaId": "7980d8f************e6f7e5696301",
          "In": 0,
          "Out": 10
        }
      ]
    }
  ]
}

In this example, the second video clip has no In, Out, or TimelineIn values—its length, start time, and end time are derived automatically from audio_1.

image

Limits

  1. ClipId and ReferenceClipId are supported only on audio and video tracks. Effect tracks and image tracks do not support these parameters.

  2. Alignment works only between clips on different tracks. Referencing a clip on the same track invalidates the timeline and causes production to fail.

  3. If a clip has both ReferenceClipId and TimelineIn/TimelineOut configured, TimelineIn and TimelineOut take precedence and alignment does not take effect.

  4. If the dependent clip is shorter than the reference clip, its playback speed decreases to match the reference duration. For example, a 10-second clip aligned with a 20-second clip plays at 0.5x speed.

  5. If the dependent clip is longer than the reference clip, it is automatically truncated to match. For example, a 20-second clip aligned with a 10-second clip retains only its first 10 seconds.

Common scenarios

Align an audio clip to a video clip

The audio clip references the video clip (video_1) and inherits its length. A volume effect reduces the audio gain to 0.2.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 4
        },
        {
          "ClipId":"video_1",
          "MediaId": "e6f7e57980************d8f696301",
          "In": 2,
          "Out": 10
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "ReferenceClipId": "video_1",
          "MediaId": "7980d8f************e6f7e5696301",
          "Effects": [
            {
              "Type": "Volume",
              "Gain": "0.2"
            }
          ]
        }
      ]
    }
  ]
}

Align a video clip to an audio clip

The video clip references the audio clip (audio_1) and plays for exactly the duration of that audio.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 5
        },
        {
          "ReferenceClipId":"audio_1",
          "MediaId": "e6f7e57980************d8f696301"
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "TimelineIn": 5,
          "ClipId": "audio_1",
          "MediaId": "7980d8f************e6f7e5696301"
        }
      ]
    }
  ]
}

Align a video track with transitions to multiple AI_TTS speeches

This scenario uses the following timeline structure:

  1. The audio track contains three AI_TTS-generated speeches.

  2. The video track contains five clips. A 2-second waterdrop transition separates the second and third clips, and the third and fourth clips.

  3. The second, third, and fourth video clips are aligned with the three speeches. Each speech starts and ends at the midpoint of its adjacent transitions.

image

Each video clip in the middle uses ReferenceClipId to lock its duration to the corresponding speech. The first and last clips use fixed Out values to form the opening and closing segments.

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "Out": 5,
      "MediaId": "e6f7e57980************d8f696301"
    },{
      "ReferenceClipId":"speech_1",
      "MediaId": "e6f7e57980************d8f696301",
      "Effects": [{
        "Type": "Transition",
        "SubType": "waterdrop",
        "Duration": 2
      }]
    }, {
      "ReferenceClipId":"speech_2",
      "MediaId": "e6f7e57980************d8f696301",
      "Effects": [{
        "Type": "Transition",
        "SubType": "waterdrop",
        "Duration": 2
      }]
    }, {
      "ReferenceClipId":"speech_3",
      "MediaId": "e6f7e57980************d8f696301"
    }, {
        "Out": 10,
        "MediaId": "e6f7e57980************d8f696301"
    }]
  }],
  "AudioTracks": [{
    "AudioTrackClips": [{
      "TimelineIn":5,
      "Type": "AI_TTS",
      "Content": "Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1. Speech 1 Speech 1. Speech 1. Speech 1 Speech 1 Speech 1 Speech 1.",
      "Voice": "sicheng",
      "ClipId":"speech_1",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }, {
      "Type": "AI_TTS",
      "Content": "Speech 2 Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2.",
      "Voice": "sicheng",
      "ClipId":"speech_2",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }, {
      "Type": "AI_TTS",
      "Content": "Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3.",
      "Voice": "sicheng",
      "ClipId":"speech_3",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }]
  }]
}

Truncate audio to match a video clip

This scenario uses the following timeline structure:

  1. The video track contains three clips. The second clip is 8 seconds long (In: 10, Out: 18).

  2. The audio track contains a single AI_TTS speech that is longer than 8 seconds.

  3. The audio clip references the second video clip. It plays for only 8 seconds—the excess is automatically truncated.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 5
        },
        {
          "ClipId":"video_1",
          "MediaId": "e6f7e57980************d8f696301",
          "In": 10,
          "Out": 18
        },
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 3,
          "Out": 10
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "ReferenceClipId": "video_1",
          "Type": "AI_TTS",
          "Content": "Hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone.",
          "Voice": "Siqi",
          "SpeechRate": 0,
          "PitchRate": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "WenQuanYi Zen Hei Mono",
              "FontSize": 26,
              "FontColorOpacity": 1,
              "FontColor": "#000000",
              "FontFace": {
                "Bold": true,
                "Italic": true,
                "Underline": false
              }
            }
          ]
        }
      ]
    }
  ]
}

Align a background video to an avatar video

This scenario uses the following timeline structure:

  1. The timeline has two video tracks. The first track contains a regular video used as the background. The second track contains an AI avatar clip with subtitles and a speech.

  2. The background video is muted and aligned to the avatar clip, so it plays for exactly as long as the avatar.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "ReferenceClipId": "avatar2",
          "MediaId": "e6f7e57980************d8f696301",
          "Effects": [
            {
              "Type": "Volume",
              "Gain": 0
            }
          ]
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "ClipId": "avatar2",
          "Type": "AI_Avatar",
          "AvatarId": "yunxin",
          "Content": "This shopping method stores goods in warehouses, which improves logistics efficiency and the safety of goods. Many e-commerce companies have already started experimenting with this model.",
          "X": 50,
          "Y": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "BottomCenter",
              "Y": 50,
              "FontSize": 40,
              "FontColor": "#ffffff",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              }
            }
          ]
        }
      ]
    }
  ]
}

Overlay an image on an avatar video

This scenario uses the following timeline structure:

  1. The first video track contains three clips: a 5-second opening, an AI avatar segment with subtitles and a speech, and a 5-second closing.

  2. The second video track contains a single image clip that is aligned to the avatar clip and overlaid on top of it.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaURL": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/opening.mp4",
          "Out": 5
        },
        {
          "ClipId": "avatar2",
          "Type": "AI_Avatar",
          "AvatarId": "yunxin",
          "Content": "This shopping method stores goods in warehouses, which improves logistics efficiency and the safety of goods. Many e-commerce companies have already started experimenting with this model.",
          "X": 50,
          "Y": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "BottomCenter",
              "Y": 50,
              "FontSize": 40,
              "FontColor": "#ffffff",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              }
            }
          ]
        },
        {
          "MediaURL": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/ending.mp4",
          "Out": 5
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "ReferenceClipId": "avatar2",
          "Type": "Image",
          "MediaId": "e6f7e57980************d8f696301",
          "Width": 0.2,
          "Height": 0.2,
          "X": 0.1,
          "Y": 0.1
        }
      ]
    }
  ]
}