LLM - Sensitive word filter (DLC)

更新时间:
复制 MD 格式

The LLM - Sensitive word filter (DLC) component filters samples that contain sensitive words. The input OSS data file must be in JSONL format (example). Each line must be a valid JSON object, which means the file itself is not a valid JSON object.

Supported compute resources

DLC

Algorithm description

This component detects and filters text samples that contain sensitive words. It can also return the sensitive words found in the text. By default, the component provides a list of over 12,000 sensitive words.

Configure the component

In the Designer workflow, you can add the LLM - Sensitive word filter (DLC) component and configure its parameters on the right side of the page:

Parameter type

Parameter

Required

Description

Default value

Fields setting

Target process field

Yes

The name of the field to process.

None

Data output OSS folder

No

The OSS storage folder for the processed data. If left empty, the default workspace path is used.

None

Sensitive word file

No

The path of the sensitive word file. If left empty, the default sensitive word list is used. The file format is: "sensitive word 1\nsensitive word 2\n...". Sensitive words are separated by line feeds.

Preset sensitive word file

Execution tuning

Number of processes

No

Set the number of processes.

8

Select resource group

Public resource group

No

Select node specifications (CPU or GPU-accelerated instance specifications), number of nodes, and virtual private cloud (VPC).

None

Dedicated resource group

No

Select the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes.

None

Maximum runtime

No

The maximum runtime of the component. If the runtime exceeds this value, the job is killed.

None