LLM - Sensitive word filter component instructions-Platform For AI(PAI)-阿里云帮助中心

The LLM - Sensitive word filter (DLC) component filters samples that contain sensitive words. The input OSS data file must be in JSONL format (example). Each line must be a valid JSON object, which means the file itself is not a valid JSON object.

Supported compute resources

DLC

Algorithm description

This component detects and filters text samples that contain sensitive words. It can also return the sensitive words found in the text. By default, the component provides a list of over 12,000 sensitive words.

Configure the component

In the Designer workflow, you can add the LLM - Sensitive word filter (DLC) component and configure its parameters on the right side of the page:

Parameter type	Parameter		Required	Description	Default value
Fields setting	Target process field		Yes	The name of the field to process.	None
	Data output OSS folder		No	The OSS storage folder for the processed data. If left empty, the default workspace path is used.	None
	Sensitive word file		No	The path of the sensitive word file. If left empty, the default sensitive word list is used. The file format is: "sensitive word 1\nsensitive word 2\n...". Sensitive words are separated by line feeds.	Preset sensitive word file
Execution tuning	Number of processes		No	Set the number of processes.	8
	Select resource group	Public resource group	No	Select node specifications (CPU or GPU-accelerated instance specifications), number of nodes, and virtual private cloud (VPC).	None
	Select resource group	Dedicated resource group	No	Select the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes.	None
	Maximum runtime		No	The maximum runtime of the component. If the runtime exceeds this value, the job is killed.	None