Data classification in DMS using DSC

更新时间:
复制 MD 格式

Data Security Center (DSC) allows you to classify data based on compliance requirements, business needs, data value, and sensitivity. This enables enterprises to implement more standardized and fine-grained data protection and risk control. Data Management (DMS) is a one-stop platform that covers the entire data lifecycle. This document describes how to apply DSC's data classification capabilities to DMS for integrated sensitive data protection.

Before you begin

  1. Create an RDS instance on the RDS console. Then, create a database and a table, and populate the table with test data.

    For more information, see (Deprecated, redirects to 'Step 1') Quickly create an RDS MySQL instance and (Deprecated, redirects to 'Step 1') Create a database and an account.

  2. On the Data Security Center console, authorize the instance and perform data classification.

    1. Purchase Data Security Center. For more information, see Purchase Data Security Center.

    2. Log in to the Data Security Center console to authorize the RDS instance and connect to the database.

      We recommend that you connect to the database during off-peak hours. When connecting, select Scan assets and identify sensitive data now.. For more information, see Authorize a database.

    3. In the navigation pane, choose Data Classification > Tasks.

    4. On the Identification Tasks tab, click Default Tasks.

    5. In the Rescan column for the target RDS instance, click Rescan.

      To minimize the scan's impact on your database, perform the rescan during off-peak hours.

  3. On the DMS console, enable sensitive data protection for the new instance. For more information, see Manage sensitive data.

Set sensitivity levels by DSC classification

DMS identifies data sensitivity from an access control perspective and classifies data into three security levels: Low, Medium, and High.

DSC classifies data from multiple perspectives, including data value, sensitivity, compliance, and business requirements. It uses five security levels: N/A, S1, S2, S3, and S4. The recommended mapping between DSC and DMS sensitivity levels is as follows:

  • N/A in DSC corresponds to the Low level in DMS.

  • S1 or S2 in DSC corresponds to the Medium level in DMS.

  • S3 or S4 in DSC corresponds to the High level in DMS.

Step 1: Query DSC data classification results

  • Query using OpenAPI

    The following code sample calls the DescribeColumns operation of Data Security Center to query data classification results:

    Important

    The following code sample retrieves the AccessKey from environment variables for demonstration purposes only. For more information about how to configure environment variables, see Configure environment variables. We recommend that you use a more secure method, such as Security Token Service (STS). For more information, see Manage access credentials.

    func TestDescribeColumns(t *testing.T) {
      // Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set in your runtime environment.
      // Leaked source code can expose your AccessKey and threaten the security of your resources. The following code sample uses environment variables to retrieve the AccessKey for demonstration purposes only. We recommend that you use a more secure method, such as STS.
        client, _err := CreateSDDPClient(tea.String(os.Getenv("")), tea.String(os.Getenv("")))
        assert.Nil(t, _err)
    
        describeColumnsRequest := &sddp20190103.DescribeColumnsRequest{
            InstanceName:   tea.String(""),
            TableName:      tea.String(""),
            RiskLevelId:    tea.Int64(4),// Specify the security level of data to query. If you leave this parameter empty, data of all levels is queried.
        }
    
        response, _err := client.DescribeColumnsWithOptions(describeColumnsRequest, &util.RuntimeOptions{})
        assert.Nil(t, _err)
        for _, item := range response.Body.Items {
            fmt.Println(*item.Name)
        }
    }

    The following sample output is returned:

    Columns at the S3 level include:
      	hide3, hide4, hide14, hide18
    Columns at the S2 level include:
      	hide7, hide13, hide15, hide16, hide19
    Columns at the S1 level include:
      	hide12
    Columns at the N/A level include:
      	hide1, hide2, hide5, hide6, hide8, hide9, hide10, hide11, hide17, hide20, hide21, hide22, hide23, hide24, hide25, hide26
  • View results in the console

    On the Data Security Center console, navigate to the Data classification > Asset overview page. Find and expand the target database instance. In the Actions column for the data object name (which is the database name), click Table Details. On the details panel of the data object, click Column Details for the data table to view the detailed identification results.

    The results page displays a table with the Column name, Data tag, Identification result (such as email or MAC address), Sensitivity level (such as S3, S2, or N/A), Correction status, and Data sampling result for each column. You can perform Correct and Restore operations.

Step 2: Set column security levels by DSC classification

DSC classifies data from multiple perspectives, including data value, sensitivity, compliance, and business requirements, and provides more detailed and comprehensive results. Use the DSC results to reconfigure the security levels of columns in DMS and establish classification principles that suit your business needs.

  • Set using OpenAPI

    The following code sample shows how to call the DMS ChangeColumnSecurityLevel operation to set the security level for a column:

    Important

    The following code sample retrieves the AccessKey from environment variables for demonstration purposes only. For more information about how to configure environment variables, see Configure environment variables. We recommend that you use a more secure method, such as Security Token Service (STS). For more information, see Manage access credentials.

    func TestChangeColumnSecurityLevel(t *testing.T) {
      	// Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set in your runtime environment.
      	// Leaked source code can expose your AccessKey and threaten the security of your resources. The following code sample uses environment variables to retrieve the AccessKey for demonstration purposes only. We recommend that you use a more secure method, such as STS.
        client, _err := CreateClient(tea.String(os.Getenv("")), tea.String(os.Getenv("")))
        assert.Nil(t, _err)
    
        request := &dms_enterprise20181101.ChangeColumnSecurityLevelRequest{
            Tid:                 tea.Int64(101950),
            DbId:                tea.Int64(35119204),
            IsLogic:             tea.Bool(false),
            SchemaName:          tea.String(""),
            TableName:           tea.String(""),
            ColumnName:          tea.String("hide1"),// The column name.
            NewSensitivityLevel: tea.String("S1"),// The new security level for the column.
        }
    
         _, _err = client.ChangeColumnSecurityLevelWithOptions(request, &util.RuntimeOptions{})
        assert.Nil(t, _err)
    }
  • Set using the DMS console

    For more information, see Adjust column security levels.

    In the Adjust Security Level dialog box, you can view the Original level and New level for each column, such as hide1 through hide7. Adjust the security level from the drop-down list and then click Submit to Security Department to save the settings.

Select data masking algorithms by DSC classification

The default data masking algorithm in DMS is full masking, which redacts all characters. For example, the four highly sensitive columns (hide3, hide4, hide14, and hide18) would all be displayed as "*******". This format is difficult to read. You can set different masking rules for different types of sensitive data.

Step 1: Query DSC classification results

The following code sample calls the DescribeColumns operation of Data Security Center to query data classification results:

Important

The following code sample retrieves the AccessKey from environment variables for demonstration purposes only. For more information about how to configure environment variables, see Configure environment variables. We recommend that you use a more secure method, such as Security Token Service (STS). For more information, see Manage access credentials.

func TestDescribeColumns(t *testing.T) {
  	// Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set in your runtime environment.
  	// Leaked source code can expose your AccessKey and threaten the security of your resources. The following code sample uses environment variables to retrieve the AccessKey for demonstration purposes only. We recommend that you use a more secure method, such as STS.
    client, _err := CreateSDDPClient(tea.String(os.Getenv("")), tea.String(os.Getenv("")))
    assert.Nil(t, _err)

    describeColumnsRequest := &sddp20190103.DescribeColumnsRequest{
        InstanceName:   tea.String(""),
        TableName:      tea.String(""),
        RuleId:         tea.Int64(1542),  // Specify the ID of the sensitive data identification rule. If you leave this empty, all rules are queried. You can call the DescribeCategoryTemplateRuleList operation to obtain this parameter.
        PageSize:       tea.Int32(20),
    }

    response, _err := client.DescribeColumnsWithOptions(describeColumnsRequest, &util.RuntimeOptions{})
    assert.Nil(t, _err)
    for _, item := range response.Body.Items {
        if item.RuleName != nil {
            fmt.Println(*item.Name, " ", *item.RuleName)
        }
    }
}

The query returns the following results:

Results:
	hide3 :  Phone number (Chinese mainland)
	hide4 :  ID card number (Chinese mainland)
	hide14 :  Email
	hide18 :  Private KEY

Step 2: Set masking rules for different data types

  • Set using OpenAPI

    The following code sample shows how to call the ModifyDesensitizationStrategy operation of DMS to set the masking rule for the hide3 column to 23180 (Phone Number Middle Four-Digit Random Replacement):

    Important

    The following code sample retrieves the AccessKey from environment variables for demonstration purposes only. For more information about how to configure environment variables, see Configure environment variables. We recommend that you use a more secure method, such as Security Token Service (STS). For more information about authentication methods, see Manage access credentials.

    func TestModifyDesensitizationStrategy(t *testing.T) {
      	// Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set in your runtime environment.
      	// Leaked source code can expose your AccessKey and threaten the security of your resources. The following code sample uses environment variables to retrieve the AccessKey for demonstration purposes only. We recommend that you use a more secure method, such as STS. For more information about authentication methods, see https://help.aliyun.com/document_detail/378661.html
        client, _err := CreateClient(tea.String(os.Getenv("")), tea.String(os.Getenv("")))
        assert.Nil(t, _err)
    
        request := &dms_enterprise20181101.ModifyDesensitizationStrategyRequest{
            Tid:                 tea.Int64(101950),
            IsLogic:             tea.Bool(false),
            SchemaName:          tea.String(""),
            TableName:           tea.String(""),
            ColumnName:          tea.String("hide3"),
            RuleId:              tea.Int32(23180),//Specify the ID of the data masking rule. You can call the ListDesensitizationRule operation to query this parameter.
        }
    
        _, _err = client.ModifyDesensitizationStrategyWithOptions(request, &util.RuntimeOptions{})
        assert.Nil(t, _err)
    }
  • Set using the DMS console

    For more information about how to adjust data masking algorithms on the DMS console, see Manage sensitive data.

    In the Data masking algorithm change dialog box, apply partial data masking algorithms to the highly sensitive columns. For example, select Phone number middle four-digit random replacement for hide3, China ID card birthday masking for hide4, and Email masking (2nd to 11th-to-last char) for hide14. For hide18, select a suitable data masking algorithm based on your needs.

Step 3: View the masking effect

After setting the data masking rules, query the data in DMS to view the effect. The hide3 (phone number) column is partially masked (for example, 13085*****7681), the date of birth in the hide4 (ID card) column is masked (for example, 640121*****7681), the hide14 (email) column is partially masked (for example, p*****@gmail.com), and the hide18 (Private KEY) column is still fully masked (*******).

In addition to manually configuring sensitivity levels and masking algorithms, DSC supports automatically synchronizing identification results to DMS labels. This includes both data classification tags and data security risk levels, enabling unified data security management across products.

Synchronize data identification results to DMS labels

Data Security Center (DSC) can automatically synchronize data identification results to column labels in Data Management (DMS). This enables DMS to directly leverage DSC's identification results for data security management, including sensitive data classification labels (data tags) and sensitivity levels (data security risk levels).

When synchronization is enabled, DSC automatically syncs the classification and grading results from sensitive data identification task scans to the corresponding data table column labels in DMS, achieving cross-product data security metadata interoperability.

Configure synchronization to DMS labels

  1. Log on to the Data Security Center console.

  2. In the left-side navigation pane, choose Classification and Grading > Tasks.

  3. On the Identification Tasks tab, find the target identification task and click Edit in the Actions column (for custom tasks), or select the target default task and click Scan Settings.

  4. In the configuration panel, find the Identification Result Synchronization settings and select Synchronize to DMS Labels.

    Note

    Before enabling this feature, ensure that the target data asset is connected to DMS with the sensitive data protection capability activated.

  5. Click OK to save. The next time the identification task runs, the results are automatically synchronized to DMS.

  6. After the next identification task completes, log on to the DMS console and check the column labels of the target table to verify that the classification tags and security levels are correctly synchronized.

Synchronize data security risk levels to DMS labels

DSC sensitivity levels (N/A, S1, S2, S3, and S4) can be synchronized to field security levels in DMS, enabling unified management of data security risk levels on the DMS side.

Level mapping

The default mapping between DSC data security risk levels and DMS field security levels is as follows:

DSC sensitivity level

DMS security level

N/A

Low

S1

Low

S2

Medium

S3

High

S4

High

Note

This mapping is the system default configuration for automatic label synchronization. To adjust it, manually modify the security level of the corresponding field in DMS.

Note

This mapping applies specifically to the automatic DMS label synchronization feature. For manual sensitivity level configuration, see the "Set sensitivity levels by DSC classification" section above, which uses a different grouping approach (S1 and S2 both correspond to Medium).