The synchronous short audio moderation operation uses an HTTP or HTTPS interface to detect content in audio files. It converts audio to text in real time and returns content moderation results and risk tags to help you improve your review efficiency. This topic describes how to call the /green/voice/syncscan operation to moderate audio content.
Usage notes
API operation: /green/voice/syncscan. This operation performs synchronous audio moderation.
You can call this operation to create synchronous audio moderation tasks. For more information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDK overview.
By default, audio moderation detects Mandarin Chinese. To detect other languages or dialects, contact your account manager. Other languages include English, Japanese, Spanish, Arabic, French, Indonesian, and Vietnamese. Dialects include Cantonese, Sichuanese, Hubei dialect, Shaanxi dialect, Shanxi dialect, Henan dialect, Northeastern dialect, Tianjin dialect, Gansu dialect, Guizhou dialect, Yunnan dialect, Jiangxi dialect, Guangxi dialect, Hunan dialect, Shandong dialect, Suzhou dialect, Zhejiang dialect, Shanghainese, and Minnan.
- Billing information:
You are charged for calling this operation. For more information about the billing methods, see
- Audio file requirements:
- The size of an audio file cannot exceed 20 MB.
- The duration of an audio file cannot exceed 1 minute.
- Supported audio file formats: MP3, WAV, AAC, WMA, OGG, M4A, and M3U8.
- Supported video file formats that contain audio: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.
QPS limits
You can call this operation up to 50 times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.
Request parameters
| Name | Type | Required | Example value | Description |
| bizType | String | No | default | The business scenario. You can create a business scenario in the Content Moderation console. For more information, see Customize moderation policies. |
| scenes | StringArray | Yes | antispam | The detection scenario. Set the value to antispam. |
| tasks | JSONArray | Yes | The detection objects. Each element in the JSON array is a struct for a detection task. You can specify up to 100 elements, which means you can submit up to 100 content entries for detection at a time. To submit 100 elements, you must increase the number of concurrent tasks to more than 100. For more information about the structure of each element, see task. |
| Name | Type | Required | Example | Description |
| clientInfo | JSONObject | No | {"userId":"120234234","userNick":"Mike","userType":"others"} | The information about the client. For more information, see the "Common request parameters" section of Common parameters. The server determines whether to use the global clientInfo parameter or the clientInfo parameter that is described in this table. Note The clientInfo parameter in this table takes priority over the global one. |
| dataId | String | No | abc_123 | The ID of the moderation object. The ID can contain letters, digits, underscores (_), hyphens (-), and periods (.). It can be up to 128 characters in length. This ID uniquely identifies your business data. |
| url | String | Yes | http://aliyundoc.com/test.mp3 | of the object to be detectedURL.
|
Returned Data
| Name | Type | Example | Description |
| code | Integer | 200 | The returned HTTP status code. For more information, see Common error codes. |
| msg | String | OK | The message returned for the request. |
| dataId | String | abc_123 | The ID of the moderation object. Note If you set the dataId parameter in the moderation request, the value of the dataId request parameter is returned here. |
| taskId | String | vc_f_1OsjIYTukH@4@AXkIQ9xxx-1ov52Y | The ID of the detection task. |
| url | String | http://aliyundoc.com/test.mp3 | of the object to be detectedURL.
|
| results | JSONArray | The detection results returned when the call is successful (code=200). The results contain one or more elements. Each element is a struct. For more information about the structure of each element, see result. |
| Name | Type | Example | Description |
| scene | String | antispam | The detection scenario. This corresponds to the scenario in the request. The value is fixed as antispam. |
| label | String | customized | The category of the detection result. Valid values:
|
| suggestion | String | block | The recommended subsequent operation. Valid values:
|
| rate | Float | 99.91 | The score of the confidence level. Valid values: 0 to 100. A greater value indicates a higher confidence level. If a value of pass is returned for the suggestion parameter, a higher confidence level indicates a higher probability that the content is normal. If a value of review or block is returned for the suggestion parameter, a higher confidence level indicates a higher probability that the content contains violations. Important We recommend that you use the values that are returned for the suggestion, label, and sublabel parameters to determine whether the content contains violations. The sublabel parameter is returned by specific operations. |
| details | JSONArray | The details of the text that corresponds to the audio. This can contain one or more elements. Each element corresponds to a sentence. For more information about the structure of each element, see detail. |
| Name | Type | Example | Description |
| startTime | Integer | 0 | The start timestamp of the sentence, in seconds. |
| endTime | Integer | 4065 | The end timestamp of the sentence, in seconds. |
| text | String | Disgusting | The text converted from the audio. |
| label | String | politics | The category of the detection result. Valid values:
|
| persons | JSONArray | [{"name":"Sensitive Person A"}] | The voiceprint recognition result. This field is returned if the voiceprint of a sensitive person is hit. The structure is as follows:
Note This field is not returned by default. If you need this feature, contact your account manager. |
| keyword | String | Disgusting | If a user-defined keyword is hit, the keyword is returned. |
| libName | String | test | If a user-defined keyword is hit, the corresponding thesaurus is returned. |
Examples
http(s)://[Endpoint]/green/voice/syncscan
&<Common request parameters>{
"scenes":[
"antispam"
],
"tasks":[
{
"dataId":"abcd-123",
"url":"http://aliyundoc.com/test.mp3"
}
]
}{
"msg":"OK",
"code":200,
"data":[
{
"code":200,
"dataId":"abcd-123",
"results":[
{
"rate":99.91,
"suggestion":"block",
"details":[
{
"libName":"test",
"startTime":0,
"endTime":4065,
"label":"customized",
"text":"Disgusting",
"keyword":"Disgusting"
},
{
"startTime":4430,
"endTime":10065,
"label":"normal",
"persons": [
{
"name": "Sensitive Person A"
}
],
"text":"Hahaha"
},
{
"libName":"Audio",
"startTime":11670,
"endTime":14685,
"label":"customized",
"text":"Clearance sale",
"keyword":"Sale"
},
{
"startTime":14685,
"endTime":16065,
"label":"ad",
"text":"12345"
}
],
"label":"customized"
}
],
"taskId":"vc_f_1OsjIYTukH@4@AXkIQ9xxx-1ov52Y"
}
],
"requestId":"5A7A6198-6960-4DDC-B67E-58A111A4B20F"
}