Create and manage identification rules
In Dataphin, you can configure identification rules to promptly identify business data that requires a high level of security.
Limitations
By default, identification rules do not automatically scan views. To scan them, you must enable view scanning in the rule execution configuration. Alternatively, you can manually add or batch-import identification results for views.
Permissions
Security administrators and custom global roles with the Identification Rule-Manage permission can create and manage identification rules.
You can manage only the rules that you own. Management tasks include editing, deleting, resetting, testing, transferring, manually running, enabling, and disabling the rules.
Notes
MaxCompute tables use MaxCompute Tunnel by default to accelerate identification, which improves speed and reduces costs. If Tunnel is unsupported in a specific scenario, Dataphin uses standard SQL for data identification.
Create an identification rule
On the Dataphin homepage, choose Governance > Data Security from the top navigation bar.
In the left-side navigation pane, choose Data Identification > Identification Rules. On the Identification Rules page, click Create Identification Rule.
In the Create Identification Rule dialog box, configure the parameters.
Parameter
Description
Basic settings
Rule Name
The name of the identification rule. Naming requirements:
Can contain Chinese characters, letters, digits, and underscores (_).
Must not exceed 12 characters.
Description
A custom description for the rule, up to 128 characters long.
Data classification and sensitivity level
Data category
Select data categories. You can select all categories, all categories under a specified directory, or specific data categories.
All Categories: All active data categories in the current tenant.
All Categories Under a Specified Directory: All active data categories in the specified directory and its subdirectories.
Specified Data Categories: Filters active data categories under the current directory and its subdirectories based on the parent directory. To add more data categories, click Add a Category Group to add multiple directories.
Scan scope
Data source
Specifies the scope of assets to scan. You can select assets from a Compute Source or a Data Source.
Compute Source: Allows you to select Dataphin tables within a specific data domain or project.
Data Source: You can select only data sources for which a metadata collection task has been configured. For a list of supported data sources, see Supported data sources for Dataphin.
Compute source table scan scope
This option appears only when you select Compute Source.
The logical relationship between conditions can be AND or OR.
You can define the scope by Data Domain, Project, or Data Table.
The matching conditions include All, Belongs to, Does not belong to, Contains, Does not contain, Regex (case-insensitive), and Regular Expression.
All: Selects all assets within the current Dataphin instance.
Belongs to/Does not belong to: Select one or more specific resources.
Contains/Does not contain: Matches by keyword. For example, to match a user information table, you can enter
user_info.Regex (case-insensitive): Enter a regular expression. For example, to match all items whose names contain
test, use the expression.*test.*. The match is case-insensitive.Regular Expression: Enter a regular expression. For example, to match all items whose names contain
test, use the expression.*test.*.
NoteYou can add up to five scope rules with a maximum of two nested levels.
You can select up to 100 data domains or projects.
Data source table scan scope
This option appears only when you select Data Source.
Data Source: Select one or more data sources to scan.
Data Scope: You can choose to scan All tables or Specified tables. If you choose Specified tables, you can add filter conditions based on full table name, asset inventory tag, table description, or db/schema to refine the asset scope. You can add up to 10 filter conditions with a logical relationship of AND or OR.
Full table name/Table description/db/schema: The available filter conditions are prefix match, suffix match, contains (only for table descriptions), and belongs to (only for db/schema).
Prefix match, Suffix match, Contains: You can enter up to 256 characters.
Belongs to: Allows you to select up to 500 assets of the corresponding type from the current source.
Asset Inventory Tag: The available filter conditions are Contains any and Contains all.
Contains any: Matches an asset if it has at least one of the selected inventory tags.
Contains all: Matches an asset only if it has all of the selected inventory tags.
Click OK to create the identification rule.
After you create the rule, it appears in the identification rule list and is enabled by default. The rule runs automatically the next day according to its execution schedule.
Identification rule list
The identification rule list displays the name, data category, owner, last updated time, and status of each rule. You can click the Description button to view information about identification rules, data sampling, identification results, and result management.
You can search for rules by name or apply filters for data category, owner, or Owned by me.
You can perform the following actions on a target identification rule.
Actions
Description
Enabled
Turn the switch in the Enabled column on or off. When enabled, the rule runs according to its scheduled and real-time scan settings, generating execution records. When disabled, you can manually trigger the rule for a specific scope as needed.
NoteDisabling a rule does not affect previously generated identification results.
Reset
Click Reset in the Actions column or at the bottom of the page. Resetting a rule first clears all existing tagging results from the data within its scan scope, then runs the identification process again to generate the latest results.
View Details
Click View Details in the Actions column to see the configuration details of the rule.
Edit
Click Edit in the Actions column to modify the rule's information.
Manual Run
In the Actions column, click the More icon and select Manual Run, or click Manual Run at the bottom of the page to run the selected rule. If auto-inheritance based on data lineage is enabled, identification results can be automatically inherited. For more details, see Data lineage-based inheritance.
When you run a batch manual scan, you can run both enabled and disabled rules. The available execution scopes are All rules (including disabled rules) and Enabled rules only.
Copy
Click Copy in the Actions column to quickly create a duplicate of the rule.
Transfer
In the Actions column, click the More icon and select Transfer, or click Transfer at the bottom of the page. Select a new owner for the rule and click OK. You can transfer an identification rule only to a security administrator.
Delete
In the Actions column, click the More icon and select Delete, or click Delete at the bottom of the page. Deleting a rule removes all data classification and sensitivity level tags that were applied by it. This removal takes effect the next day. Previously generated execution records are not affected.
Test
When you test a rule on a specified project, data source, or table, it applies classification, sensitivity level, and rule-based tags to assets within that scope; otherwise, these actions are ignored. A default test extracts sample data from the rule's scan scope and performs these actions on the sample.
Click Test at the bottom of the page and select the projects, data sources, or data tables you want to test. You can select up to 10 projects or 10 tables.
After the test is complete, click View Test Results to see the details.
NoteTest runs only display results for the sample data and do not apply any actual tags.
Test runs consume computing resources because they perform data scans and computations. To minimize resource usage and execution time, we recommend defining a precise test scope. The execution time varies depending on the number and complexity of the rules in the selected scope, so please be patient.
A test only determines if a single identification rule can identify sensitive data. In a real scan, multiple matching rules are evaluated, and one is chosen based on priority. Therefore, the test tagging results may differ from the actual rule tagging results.
Manually trigger an identification rule
On the Identification Rules page, click Manual Rule Scan to open the Manual Rule Scan dialog box.
In the Manual Rule Scan dialog box, configure the parameters.
Parameter
Description
Scan scope
Define the scan scope by selecting one of the following options: Full Database Scan, Scan by Project, Scan by Data Source, or Scan by Table.
Full Database Scan: Scans all data within the Dataphin instance.
Scan by Project: Scans all data within the selected projects.
Scan by Data Source: Scans all data within the selected data sources.
Scan by Table: Scans all data within the selected data tables. You can select up to 10 tables from a project or data source.
Rule execution scope
Define which rules to run by selecting either Enabled rules only or All rules (including disabled rules).
Enabled rules only: Includes all identification rules in Dataphin that are currently enabled.
All rules (including disabled rules): Includes all identification rules in Dataphin, regardless of their status.
NoteYou must first enable auto-inheritance and select the Triggered by rule execution scenario in the auto-inheritance configuration. For details, see Auto-inheritance configuration.
When enabled, a manual scan also triggers an automatic inheritance process. Downstream fields inherit the sensitivity level of their direct upstream fields based on data lineage, which expands scan coverage and improves the consistency of results across related data.
Enabling auto-inheritance expands the scan scope and consumes additional computing resources. Configure this feature based on your business needs.
Click OK to start the scan on the selected assets.
You can go to the Execution Records page to monitor the progress. The scan duration varies depending on the amount of data being scanned, so please be patient.
Next steps
- After you create an identification rule, you can adjust its scan method based on your business needs. For more information, see Identification rule execution configuration and Manually trigger an identification rule. You can also enable auto-inheritance. For details, see Auto-inheritance configuration.
The execution records list shows the sensitive data found by rule scans. For details, see Add and manage identification results.