The LLM - Sensitive Information Masking (DLC) component replaces sensitive information with placeholders such as [EMAIL], [TELEPHONE], [MOBILEPHONE], and [IDNUM]. The input OSS data file must be in JSON Lines (JSONL) format (example), where each line is a standalone JSON object.
Supported computing resources
How it works
The component detects and masks the following types of sensitive information:
-
Mobile phone numbers: Strings that match the following regular expressions are replaced with
[MOBILEPHONE].-
r'(?<!\d)(1(3[0-9]|4[579]|5[0-3,5-9]|6[6]|7[0135678]|8[0-9]|9[89])\d{8})(?!\d)'
-
r'(?<!\d)(1[\d]{2}-\d{4}-\d{4}\D|\D1\d{10}\D|\D1[\d]{2} \d{4} \d{4})(?!\d)'
-
r'(?<!\d)(1[3-9]\d{9})(?!\d)'
-
-
Landline phone numbers: Strings that match the following regular expression are replaced with
[TELEPHONE].-
r'(?<!\d)(\(?0\d{2,3}[-\s)]?\d{7,8})(?!\d)'
-
-
Email addresses: Strings that match the following regular expression are replaced with
[EMAIL].-
r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+'
-
-
ID card numbers: Strings that match the following regular expressions are replaced with
[IDNUM].-
r'(?<!\d)([1-6]\d{5}[12]\d{3}(0[1-9]|1[12])(0[1-9]|1[0-9]|2[0-9]|3[01])\d{3}(\d|X|x))(?!\d)'
-
r'(?<!\d)([1-9]\d{5}[12]\d{3}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\d{3}[0-9xX])(?!\d)'
-
For example, to mask an email address:
|
Before The Current Field Value contains a JavaScript code snippet for a Select2 Malay translation plugin. The |
After
|
Configure the component
On the Designer workflow page, add the LLM - Sensitive Information Masking (DLC) component and configure its parameters in the right pane.
|
Parameter type |
Parameter |
Required |
Description |
Default |
|
|
Field settings |
Target field |
Yes |
The field to process. |
None |
|
|
Data output OSS directory |
No |
The OSS directory for the processed data. If empty, the component uses the default workspace path. |
None |
||
|
Execution tuning |
Number of processes |
No |
The number of processes to use for parallel execution. |
8 |
|
|
Select resource group |
Public resource group |
No |
Specify the node specifications (CPU or GPU instance types), the number of nodes, and the VPC. |
None |
|
|
Dedicated resource group |
No |
Specify the number of CPU cores, amount of memory, shared memory size, number of GPUs, and number of nodes. |
None |
||
|
Maximum runtime |
No |
The maximum runtime of the component. The system terminates the job if this limit is exceeded. |
None |
||