how to create a dataset in dataset management-Smart Conversation Analysis(SCA)-阿里云帮助中心

You can use dataset management to manage text and voice datasets for quality inspection.

Create a dataset

Click the Create Dataset button in the upper-left corner of the dataset list to open a dialog box where you can upload a dataset.

Dataset name: The name of the dataset you are creating.
Dataset type: The system supports voice and text datasets by default. Voice datasets are for call center scenarios, and text datasets are for online support scenarios.
Upload file: You can upload a single file or a folder for batch uploads.

Note:

Voice datasets support only the WAV, MP3, V3, and VOX file formats. You can upload up to 500 audio files at a time. The total size cannot exceed 10 GB, and each file cannot exceed 100 MB. File names cannot contain Chinese characters. Audio files that do not meet the requirements are automatically transcoded during the upload. For example, the sample rate is converted to 8000 Hz.
Text datasets support only the CSV file format. You can upload up to 10,000 conversation files at a time.

After the upload is complete, click Next at the bottom of the page to start configuring speaker roles.

Speaker role configuration

The system randomly selects a file. For voice files, click Start Audio Transcription to convert the speech to text. After the transcription is complete, you can configure the speaker roles based on the conversation text. The configuration method differs for single-track and dual-track recordings, as described in the following sections.

When converting speech to text, the system automatically separates the recording into two speaker roles. However, the system cannot always accurately identify which speaker is the customer service representative. You must manually set the roles based on the text content. Select which speaker is the customer service representative, and the other speaker is automatically set as the customer. Accurate speaker role configuration is critical. Many quality inspection rules are limited to a specific scope, such as applying only to the customer service representative or the customer. If the speaker roles are configured incorrectly, the accuracy of the inspection results will be significantly affected.

Configure speaker roles for single-track recordings

For single-track recordings, after audio transcription is complete, one speaker is the customer and the other is the customer service representative. You can identify the roles as follows:

Identify the customer service representative by keyword: Based on your business scenario, enter one or more keywords that the customer service representative typically says at the beginning of a conversation. If these keywords are matched, that speaker is identified as the customer service representative, and the other speaker is identified as the customer.

Select the appropriate role identification method and click Authenticate. The text in the dialog box is updated to show the assigned roles. Check if the roles are correct. If not, adjust the keywords. After you finish configuring the speaker roles, click Complete Creation at the bottom of the page. All files in this dataset will now use the same role identification method.

Note that speaker role separation for single-track recordings is not 100% accurate. To ensure accuracy, set the recording files from your call center to use dual-track recording. This method puts the customer on one track and the customer service representative on another, which prevents speaker role separation faults at the source.

Configure speaker roles for dual-track recordings

After you verify that your selections are correct, click Complete Creation at the bottom of the page. All files in this dataset will now use the same role determination method. Two methods are available for determining roles:

Identify the customer service representative by global keyword: Based on your business scenario, enter one or more keywords that the customer service representative typically says at the beginning of a conversation. If these keywords are matched, that speaker is identified as the customer service representative, and the other speaker is identified as the customer.
Identify by specified track: Based on the conversation text, select the correct role for Speaker A. The role for Speaker B changes automatically.