Audio labeling templates

更新时间:
复制 MD 格式

iTAG provides labeling templates for audio classification, audio segmentation, and automatic speech recognition (ASR). When you create a labeling job, you select a template based on your use case. This topic describes the use cases for these audio templates and their input and output data structures.

Background information

This topic describes the data structure for the following audio labeling templates:

Audio classification

Audio classification assigns one or more predefined labels to an audio clip. This labeling template supports both single-label and multi-label classification.

  • Use cases

    A common use case is ambient sound classification.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object representing a single audio file to be labeled. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object containing the source audio file's location and its annotation results. The following example shows the JSON structure:

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/6.wav"
          },
          "label-1432993193909231616": {
              "results": [
                  {
                      "questionId": "1",
                      "data": "Label 1",
                      "markTitle": "single-choice",
                      "type": "survey/value"
                  }
              ]
          }
      }

Audio segmentation

Audio segmentation identifies and labels specific time segments within an audio file. You can use a sound wave graph to define the start and end times for each segment.

  • Use cases

    A common use case is conversation analysis.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object representing a single audio file to be labeled. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object containing the source audio file's location and its annotation results. The following example shows the JSON structure:

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/21.wav"
          },
          "label-1435480301706092544": {
              "results": [
                  {
                      "duration": 0,
                      "objects": [
                          {
                              "result": {
                                  "Audio recognition result": "This is the transcribed content for segment 1.",
                                  "single-choice": "Label 1"
                              },
                              "color": null,
                              "id": "wavesurfer_ei0aet9uvp8",
                              "start": 2.3886218302094817,
                              "end": 4.635545755237045
                          },
                          {
                              "result": {
                                  "Audio recognition result": "This is the transcribed content for segment 2.",
                                  "single-choice": "Label 2"
                              },
                              "color": null,
                              "id": "wavesurfer_kl39gnlb2k",
                              "start": 5.698280044101433,
                              "end": 7.348048511576626
                          }
                      ],
                      "empty": false
                  }
              ]
          }
      }

Automatic speech recognition (ASR)

ASR converts spoken audio into written text. This labeling template lets you transcribe audio files and apply relevant labels.

  • Use cases

    A common use case is dialect recognition.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object representing a single audio file to be labeled. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object containing the source audio file's location and its annotation results. The following example shows the JSON structure:

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/14.wav"
          },
          "label-1435448359497441280": {
              "results": [
                  {
                      "questionId": "1",
                      "data": "This is the transcribed content.",
                      "markTitle": "Audio recognition result",
                      "type": "survey/value"
                  },
                  {
                      "questionId": "3",
                      "data": [
                          "Label 1",
                          "Label 2"
                      ],
                      "markTitle": "multiple-choice",
                      "type": "survey/multivalue"
                  }
              ]
          }
      }