Synchronous short speech detection-AI Guardrails(AI Guardrails)-阿里云帮助中心

The synchronous short audio moderation operation uses an HTTP or HTTPS interface to detect content in audio files. It converts audio to text in real time and returns content moderation results and risk tags to help you improve your review efficiency. This topic describes how to call the /green/voice/syncscan operation to moderate audio content.

Usage notes

API operation: /green/voice/syncscan. This operation performs synchronous audio moderation.

You can call this operation to create synchronous audio moderation tasks. For more information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDK overview.

Note

By default, audio moderation detects Mandarin Chinese. To detect other languages or dialects, contact your account manager. Other languages include English, Japanese, Spanish, Arabic, French, Indonesian, and Vietnamese. Dialects include Cantonese, Sichuanese, Hubei dialect, Shaanxi dialect, Shanxi dialect, Henan dialect, Northeastern dialect, Tianjin dialect, Gansu dialect, Guizhou dialect, Yunnan dialect, Jiangxi dialect, Guangxi dialect, Hunan dialect, Shandong dialect, Suzhou dialect, Zhejiang dialect, Shanghainese, and Minnan.

Billing information:
You are charged for calling this operation. For more information about the billing methods, see
Audio file requirements:
- The size of an audio file cannot exceed 20 MB.
- The duration of an audio file cannot exceed 1 minute.
- Supported audio file formats: MP3, WAV, AAC, WMA, OGG, M4A, and M3U8.
- Supported video file formats that contain audio: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.

QPS limits

You can call this operation up to 50 times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.

Request parameters

Name	Type	Required	Example value	Description
bizType	String	No	default	The business scenario. You can create a business scenario in the Content Moderation console. For more information, see Customize moderation policies.
scenes	StringArray	Yes	antispam	The detection scenario. Set the value to antispam.
tasks	JSONArray	Yes		The detection objects. Each element in the JSON array is a struct for a detection task. You can specify up to 100 elements, which means you can submit up to 100 content entries for detection at a time. To submit 100 elements, you must increase the number of concurrent tasks to more than 100. For more information about the structure of each element, see task.

Table 1. task
Name	Type	Required	Example	Description
clientInfo	JSONObject	No	{"userId":"120234234","userNick":"Mike","userType":"others"}	The information about the client. For more information, see the "Common request parameters" section of Common parameters. The server determines whether to use the global clientInfo parameter or the clientInfo parameter that is described in this table. Note The clientInfo parameter in this table takes priority over the global one.
dataId	String	No	abc_123	The ID of the moderation object. The ID can contain letters, digits, underscores (_), hyphens (-), and periods (.). It can be up to 128 characters in length. This ID uniquely identifies your business data.
url	String	Yes	http://aliyundoc.com/test.mp3	of the object to be detectedURL. Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters. Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket. file path format: oss://<bucket-name>.<endpoint>/<object-name>

Returned Data

Name	Type	Example	Description
code	Integer	200	The returned HTTP status code. For more information, see Common error codes.
msg	String	OK	The message returned for the request.
dataId	String	abc_123	The ID of the moderation object. Note If you set the dataId parameter in the moderation request, the value of the dataId request parameter is returned here.
taskId	String	vc_f_1OsjIYTukH@4@AXkIQ9xxx-1ov52Y	The ID of the detection task.
url	String	http://aliyundoc.com/test.mp3	of the object to be detectedURL. Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters. Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket. file path format: oss://<bucket-name>.<endpoint>/<object-name>
results	JSONArray		The detection results returned when the call is successful (code=200). The results contain one or more elements. Each element is a struct. For more information about the structure of each element, see result.

Table 2. result
Name	Type	Example	Description
scene	String	antispam	The detection scenario. This corresponds to the scenario in the request. The value is fixed as antispam.
label	String	customized	The category of the detection result. Valid values: normal: normal text spam: Contains spam messages. ad: ads politics: political content terrorism: terrorist content abuse: Offensive or insulting language porn: pornographic content flood: excessive junk content contraband: prohibited content meaningless: No meaning. harmful: undesirable scenarios (for protecting minors, including worshiping money, fan culture, negative emotions, and negative guidance) customized: custom content (such as a hit on a custom keyword)
suggestion	String	block	The recommended subsequent operation. Valid values: pass: The content is normal. No further action is required. review: The result is inconclusive and requires manual review. block: The content is non-compliant. We recommend that you delete the content or restrict its visibility.
rate	Float	99.91	The score of the confidence level. Valid values: 0 to 100. A greater value indicates a higher confidence level. If a value of pass is returned for the suggestion parameter, a higher confidence level indicates a higher probability that the content is normal. If a value of review or block is returned for the suggestion parameter, a higher confidence level indicates a higher probability that the content contains violations. Important We recommend that you use the values that are returned for the suggestion, label, and sublabel parameters to determine whether the content contains violations. The sublabel parameter is returned by specific operations.
details	JSONArray		The details of the text that corresponds to the audio. This can contain one or more elements. Each element corresponds to a sentence. For more information about the structure of each element, see detail.

Table 3. detail
Name	Type	Example	Description
startTime	Integer	0	The start timestamp of the sentence, in seconds.
endTime	Integer	4065	The end timestamp of the sentence, in seconds.
text	String	Disgusting	The text converted from the audio.
label	String	politics	The category of the detection result. Valid values: normal: normal text spam: Contains spam messages. ad: ads politics: political content terrorism: terrorist content abuse: Offensive or insulting language porn: pornographic content flood: excessive junk content contraband: prohibited content meaningless: No meaning. harmful: undesirable scenarios (for protecting minors, including worshiping money, fan culture, negative emotions, and negative guidance) customized: custom content (such as a hit on a custom keyword)
persons	JSONArray	[{"name":"Sensitive Person A"}]	The voiceprint recognition result. This field is returned if the voiceprint of a sensitive person is hit. The structure is as follows: name: a string that indicates the sensitive person information identified from the audio. Note This field is not returned by default. If you need this feature, contact your account manager.
keyword	String	Disgusting	If a user-defined keyword is hit, the keyword is returned.
libName	String	test	If a user-defined keyword is hit, the corresponding thesaurus is returned.

Examples

Sample request

http(s)://[Endpoint]/green/voice/syncscan
&<Common request parameters>{
    "scenes":[
        "antispam"
    ],
    "tasks":[
        {
            "dataId":"abcd-123",
            "url":"http://aliyundoc.com/test.mp3"
        }
    ]
}

Sample response

{
    "msg":"OK",
    "code":200,
    "data":[
        {
            "code":200,
            "dataId":"abcd-123",
            "results":[
                {
                    "rate":99.91,
                    "suggestion":"block",
                    "details":[
                        {
                            "libName":"test",
                            "startTime":0,
                            "endTime":4065,
                            "label":"customized",
                            "text":"Disgusting",
                            "keyword":"Disgusting"
                        },
                        {
                            "startTime":4430,
                            "endTime":10065,
                            "label":"normal",
                            "persons": [
                                {
                                    "name": "Sensitive Person A"
                                }
                            ],
                            "text":"Hahaha"
                        },
                        {
                            "libName":"Audio",
                            "startTime":11670,
                            "endTime":14685,
                            "label":"customized",
                            "text":"Clearance sale",
                            "keyword":"Sale"
                        },
                        {
                            "startTime":14685,
                            "endTime":16065,
                            "label":"ad",
                            "text":"12345"
                        }
                    ],
                    "label":"customized"
                }
            ],
            "taskId":"vc_f_1OsjIYTukH@4@AXkIQ9xxx-1ov52Y"
        }
    ],
    "requestId":"5A7A6198-6960-4DDC-B67E-58A111A4B20F"
}