Enhanced audio moderation API-AI Guardrails(AI Guardrails)-阿里云帮助中心

This topic describes how to use the Audio Moderation API for asynchronous audio file moderation and real-time audio stream moderation.

Usage

The service endpoint is https://green-cip.{region}.aliyuncs.com.

Call this API to create audio content moderation tasks. For details on constructing HTTP requests, see Make a request by using HTTP. You can also use pre-configured requests. For more information, see SDKs and best practices for the enhanced audio moderation feature of Content Moderation V2.0.

API operations:
- Submit moderation task: VoiceModeration
- Query moderation task result: VoiceModerationResult
- Cancel moderation task: VoiceModerationCancel

Supported regions and endpoints:

Region	Public endpoint	Internal endpoint	Supported services
China (Shanghai)	green-cip.cn-shanghai.aliyuncs.com	green-cip-vpc.cn-shanghai.aliyuncs.com	audio_media_detection, audio_media_detection_pro, live_stream_detection, live_stream_detection_pro, and voice_aigc_detector
China (Beijing)	green-cip.cn-beijing.aliyuncs.com	green-cip-vpc.cn-beijing.aliyuncs.com	audio_media_detection and live_stream_detection
China (Hangzhou)	green-cip.cn-hangzhou.aliyuncs.com	green-cip-vpc.cn-hangzhou.aliyuncs.com
China (Shenzhen)	green-cip.cn-shenzhen.aliyuncs.com	green-cip-vpc.cn-shenzhen.aliyuncs.com

Billing:
This is a paid service. Billing applies only to requests that return an HTTP 200 status code. Requests that return other error codes are not billed. For billing details, see Billing.

Performance:

Performance metric	Description
Audio file size	The maximum size for an audio file is 500 MB.
Audio file format	Supported audio formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR. Supported video formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.
Audio live stream	Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP.
Requests per second (QPS)	The QPS limit for task submission is 100.
Concurrent streams	The default limit for concurrent streams is 50.

Submit a moderation task

Request parameters

Parameter

Type

Required

Example

Description

Service

String

Yes

live_stream_detection

The type of moderation service. Valid values:

audio_media_detection: Audio and video media detection
audio_media_detection_pro: Audio and video media detection (Pro)
live_stream_detection: Social entertainment live stream detection
live_stream_detection_pro: Social entertainment live stream detection (Pro)
voice_aigc_detector: AI-generated content (AIGC) voice detection

ServiceParameters

JSONString

Yes

The parameters required by the moderation service. This value must be a JSON string. For more information about the parameters, see ServiceParameters.

Table 1. ServiceParameters

Parameter	Type	Required	Example	Description
url	String	Conditional. Specify the audio file in one of three ways: Provide the `url` of the audio file. Provide the `ossBucketName`, `ossObjectName`, and `ossRegionId` to access a file in OSS. Upload a local audio file. This method does not use your OSS storage and stores the file for only 30 minutes. The SDK supports this upload method. For code examples, see SDKs and best practices for the enhanced audio moderation feature of Content Moderation V2.0.	http://aliyundoc.com/test.flv	The URL of the object to moderate. The URL must be a publicly accessible HTTP or HTTPS URL.
ossBucketName	String		bucket_01	The name of the authorized OSS bucket. Note To use the internal endpoint of an OSS object, you must use your root account to grant permissions on the Cloud Resource Access Authorization page.
ossObjectName	String		20240307/07/28/test.flv	The name of the file in the authorized OSS bucket.
ossRegionId	String		cn-shanghai	The region where the OSS bucket is located.
callback	String	No	http://aliyundoc.com	The URL to which moderation results are sent. This can be an HTTP or HTTPS endpoint. If you do not specify this parameter, you must poll for the results. The callback API must support the POST method, data transmitted in UTF-8 encoding, and the form parameters checksum and content. Content Moderation sets checksum and content according to the following rules and format and calls your callback API to return the detection results. checksum: A string generated by concatenating `user ID + seed + content` and then applying the SHA256 algorithm. The user ID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud Console. To prevent tampering, when you receive a push result, you can generate a string by using the same algorithm and verify it against the checksum. Note The UID must belong to your Alibaba Cloud root account, not a RAM user. content: A string in JSON format. You must parse the string to convert it into a JSON object. For an example of the content result, see the sample response for querying detection results. Note After your callback endpoint receives the result from Content Moderation, it must return an HTTP 200 status code to confirm receipt. Any other HTTP status code is considered a failure. On failure, Content Moderation retries up to 16 times. If all 16 retries fail, the service stops sending the result. We recommend that you check the status of your callback endpoint.
seed	String	Conditional	abc****	A random string used to generate the callback signature. The string can contain letters, digits, and underscores (_), and must not exceed 64 characters. You can define this value to verify that the callback notification originates from Alibaba Cloud Content Moderation. Note This parameter is required when `callback` is specified.
cryptType	String	No	SHA256	When `callback` is used, this parameter specifies the encryption algorithm for the callback signature. Content Moderation uses this algorithm to calculate the signature. Valid values: SHA256 (Default): Uses the SHA256 encryption algorithm. SM3: Uses the SM3 (ShangMi) HMAC-SM3 encryption algorithm. It returns a hexadecimal string that consists of lowercase letters and digits. For example, `abc` encrypted with SM3 returns `66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0`.
liveId	String	No	liveId1****	The ID of the live audio stream. This parameter is used to deduplicate audio live streaming tasks. If this parameter is passed, the system uses `uid+service+liveId` to check if a detection task is already in progress. If so, the system returns the `taskId` of the existing task instead of initiating a new one.
dataId	String	No	voice20240307***	A custom ID for the data that you are moderating. The ID can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.), and must not exceed 64 characters. You can use it to uniquely identify your business data.
referer	String	No	www.aliyun.com	The `Referer` header, used for scenarios such as hotlink protection. The value cannot exceed 256 characters.
extra	String	No	{"VolcAppId":"6fabbd**1a7e", "VolcTokenId": "User123456", "VolcToken": "6fabbd**1a7e"}	A JSON string of extended parameters, used only to specify parameters for third-party Real-Time Communication (RTC) products. For more information, see Integrate enhanced audio moderation with third-party RTC products.

Response parameters

Parameter	Type	Example	Description
Code	Integer	200	The status code returned for the request. For details, see Status codes.
Data	JSONObject	{"TaskId": "AAAAA-BBBBB","DataId": "voice20240307***"}	The data returned for a successful request.
Message	String	SUCCESS	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The unique ID generated for this request.

Examples

Request example

{
    "Service": "audio_media_detection",
    "ServiceParameters": {
        "cryptType": "SHA256",
        "seed": "abc***123",
        "callback": "https://aliyun.com/callback",
        "url": "http://aliyundoc.com/test.flv"
    }
}

Response example

{
    "Code": 200,
    "Data": {
        "TaskId": "AAAAA-BBBBB",
        "DataId": "voice20240307***"
    },
    "Message": "SUCCESS",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query task results

When you query a live stream moderation task, the response returns data for the most recent voice slices while the task is in progress. After the task is complete, the response includes data for all voice slices.

To query a moderation task, call the VoiceModerationResult operation.
Billing: This API operation does not incur charges.
Polling for results: We recommend waiting 30 seconds after submitting an asynchronous detection task before you start polling for results. Results are available for 24 hours and are then deleted.

QPS limit

The QPS limit for a single user is 100 queries per second (QPS). API calls that exceed this limit are throttled, which may affect your business. Call this API at a reasonable rate.

Request parameters

Parameter	Type	Required	Example value	Description
Service	String	Yes	live_stream_detection	The moderation service type.
ServiceParameters	JSONString	Yes		The parameters required for the moderation service. This value must be a JSON string. For details on each field, see ServiceParameters.

Table 2. ServiceParameters

Parameter	Type	Required	Example value	Description
taskId	String	Yes	AAAAA-BBBBB	The ID returned when the task is submitted.

Response parameters

Parameter	Type	Example value	Description
Code	Integer	200	The status code. For more information, see Status Codes.
Data	JSONObject		The result of the audio content moderation. For more information, see Data.
Message	String	OK	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The request ID.

Table 3. Data

Parameter	Type	Example value	Description
Url	String	https://aliyundoc.com	The URL of the object being moderated.
LiveId	String	liveId1****	The ID of the live audio stream (optional).
DataId	String	voice20240307***	The data ID of the object being moderated (optional).
RiskLevel	String	high	The overall risk level of the audio, determined from all voice slices. Valid values include: high: High Risk medium: Medium Risk low: Low Risk none: No Risk Detected Note We recommend addressing high-risk content immediately and sending medium-risk content for manual review. Low-risk content can be treated as safe, unless your use case requires high recall.
SliceDetails	JSONArray		The detailed results for each voice slice. For more information, see SliceDetails.
ManualTaskId	String		The manual review task ID. This ID is used to query the results of a manual review. This ID is returned if human-machine review is configured and the conditions for manual review are met. For more information about the configuration, see Human-Machine Review Service Configuration.

Table 4. SliceDetails

Parameter	Type	Example value	Description
StartTime	Integer	0	The start time of the voice slice, in seconds.
EndTime	Integer	4065	The end time of the voice slice, in seconds.
StartTimestamp	Integer	1678854649720	The start timestamp of the voice slice, in milliseconds.
EndTimestamp	Integer	1678854649720	The end timestamp of the voice slice, in milliseconds.
Text	String	disgusting	The speech-to-text transcript.
Url	String	https://aliyundoc.com	If the moderated content is an audio stream, this is the temporary URL for the corresponding audio segment. This URL is valid for 30 minutes. Save the file to a different location before it expires.
Labels	String	political_content,xxxx	The labels for detected content, separated by commas. Audio moderation labels include: ad: Ad-related Promotion violence: Violent and Terrorist Content political_content: Politically Sensitive Content specified_speaking: Specified Speaking specified_lyrics: Specified Lyrics sexual_content: Pornographic Content sexual_sounds: Moaning contraband: Contraband profanity: Profanity religion: Religious Content cyberbullying: Cyberbullying negative_content: Negative Content nontalk: Muted Audio C_customized: Matched in a user-defined library AIGC voice detection labels include: aigc: Suspected AIGC-generated Audio ugc: Non-AIGC-generated Audio Note To determine if audio is AIGC-generated, use the label directly instead of the risk level.
RiskLevel	String	high	The risk level of the voice slice. Valid values include: high: High Risk medium: Medium Risk low: Low Risk none: No Risk Detected
RiskWords	String	AAA,BBB,CCC	The detected risk words, separated by commas.
RiskTips	String	sexual_content_vulgar_words,sexual_content_description	The sub-labels, separated by commas.
Extend	String	{"riskTips":"sexual_content_vulgar_words","riskWords":"sex_services"}	Reserved field.

Examples

Request example

{
    "Service": "live_stream_detection",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}

Response example

{
    "Code": 200,
    "Data": {
        "DataId": "voice20240307***",
        "LiveId": "liveId1****",
        "RiskLevel": "high",
        "SliceDetails": [
            {
                "EndTime": 4065,
                "Labels": "political_content,xxxx",
                "RiskLevel": "high",
                "RiskTips": "contraband_prohibited_items",
                "RiskWords": "RiskWordA",
                "StartTime": 0,
                "Text": "Content Moderation product test case",
                "Url": "https://aliyundoc.com"
            }
        ]
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Cancel a moderation task

This operation cancels live stream moderation tasks only. File-based tasks cannot be canceled.

To cancel a moderation task, call the VoiceModerationCancel API.
This API operation is free of charge.

Request parameters

Parameter	Type	Required	Example value	Description
Service	String	Yes	live_stream_detection	The moderation service.
ServiceParameters	JSONString	Yes		The parameter set for the moderation service, which must be a JSON string. For details on each parameter, see ServiceParameters.

Table 5. ServiceParameters

Parameter	Type	Required	Example value	Description
taskId	String	Yes	AAAAA-BBBBB	The ID of the moderation task to be canceled.

Response parameters

Parameter	Type	Example value	Description
Code	Integer	200	The status code. See Status Codes for a list of all possible values.
Message	String	OK	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The request ID.

Examples

Request example

{
    "Service": "live_stream_detection",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}

Response example

{
    "Code": 200,
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback message format

The callback message is sent in JSON format and contains the following fields:

Parameter	Type	Description
checksum	String	The checksum is a string generated by concatenating `uid + seed + content` and then applying the SHA256 algorithm. Your user ID is your Alibaba Cloud account ID, which can be found in the Alibaba Cloud console. To prevent tampering, you can compute a signature using the same algorithm when you receive the callback message and verify it against the checksum value. Note The user ID must be your Alibaba Cloud account ID, not the ID of a RAM user.
taskId	String	The task ID associated with the callback message.
content	String	The serialized moderation result, provided as a JSON string. You must parse this string to retrieve the JSON object. The content format is the same as the response from a task result query. For more information, see Return Parameters.

Status codes

This topic describes the status codes returned by the API. You are billed only for requests that return a 200 status code.

Code	Description
200	The request was successful.
280	Detection is in progress.
400	A request parameter is empty.
401	A request parameter is invalid.
402	The length of a request parameter is invalid. Check the parameter length and try again.
403	The number of requests exceeds the QPS limit. Reduce your request rate and try again.
404	File download failed. Check the file and try again.
405	The file download timed out, possibly because the file is inaccessible. Check the file and try again.
406	The file exceeds the size limit. Check the file size and try again.
407	The file format is not supported. Check the file format and try again.
408	The account is not authorized to call this API. This error can occur if the service has not been activated, the account has an overdue payment, or the account lacks required permissions.
480	The number of concurrent requests exceeds the limit. Reduce the number of concurrent requests and try again.
500	An internal system error occurred.