The LLM - Sensitive word filter (DLC) component filters samples that contain sensitive words. The input OSS data file must be in JSONL format (example). Each line must be a valid JSON object, which means the file itself is not a valid JSON object.
Supported compute resources
Algorithm description
This component detects and filters text samples that contain sensitive words. It can also return the sensitive words found in the text. By default, the component provides a list of over 12,000 sensitive words.
Configure the component
In the Designer workflow, you can add the LLM - Sensitive word filter (DLC) component and configure its parameters on the right side of the page:
Parameter type | Parameter | Required | Description | Default value | |
Fields setting | Target process field | Yes | The name of the field to process. | None | |
Data output OSS folder | No | The OSS storage folder for the processed data. If left empty, the default workspace path is used. | None | ||
Sensitive word file | No | The path of the sensitive word file. If left empty, the default sensitive word list is used. The file format is: "sensitive word 1\nsensitive word 2\n...". Sensitive words are separated by line feeds. | Preset sensitive word file | ||
Execution tuning | Number of processes | No | Set the number of processes. | 8 | |
Select resource group | Public resource group | No | Select node specifications (CPU or GPU-accelerated instance specifications), number of nodes, and virtual private cloud (VPC). | None | |
Dedicated resource group | No | Select the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes. | None | ||
Maximum runtime | No | The maximum runtime of the component. If the runtime exceeds this value, the job is killed. | None | ||