Configure and execute data masking

更新时间:
复制 MD 格式

Data Security Center (DSC) supports both static and dynamic data masking. Static data masking creates a masking task, specifies target data assets, identifies sensitive fields based on masking rules, and applies masking algorithms—such as redaction, encryption, or substitution—to process the specified fields. The masked data is then saved to a destination you choose. Dynamic data masking uses the ExecDatamask API to apply masking rules to specified fields in JSON-formatted data.

Choose a masking method

Masking method

Data Sources That Support Data Masking

Scenarios

How to operate

Static data masking

  • RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, OceanBase tables, AnalyticDB-MySQL tables, and self-managed database tables on ECS.

  • Structured TXT, CSV, XLSX, and XLS files in OSS buckets.

  • Structured TXT, CSV, XLSX, and XLS files saved on your local computer.

Use this method when you need to share data with other users but must not expose certain sensitive fields.

After masking specific tables or files, save the masked data to another table or file for secure sharing without affecting the original data.

In the DSC console, create a masking task and configure the source data, masking rules, destination for masked data, and task execution schedule.

Dynamic data masking

Construct your own JSON-formatted data that meets the following requirements: dataHeaderList defines column names, dataList contains the data to mask, and the order of columns in dataHeaderList must match the order of data in dataList. Use ruleList to specify masking rules. For details, see Perform dynamic data masking.

{
    "dataHeaderList": ["name", "age"],
    "dataList": [
        ["lily", 18],
        ["lucy", 17]
    ],
    "ruleList": [1002, null]
}

This method offers more flexibility because you can construct your own data source for masking.

Call the Perform dynamic data masking API using OpenAPI online debugging, Alibaba Cloud SDKs, or custom API integrations.

For details, see Integration overview.

Masking result examples

DSC provides masking algorithms including hash masking, redaction masking, substitution masking, rounding masking, encryption masking, and shuffling masking. The following tables show example results for each algorithm.

Hash masking

Applicable types and typical scenarios

Algorithm description

Algorithm configuration example

Original data example

Masked result example

Irreversible algorithm.

Supports common hash algorithms and lets you configure an offset (salt value).

Useful for passwords or scenarios where you need to verify sensitive data by comparison.

  • Sensitive type: Key-related.

  • Scenario: Data storage.

MD5

Salt value: test

123456

d6f82c64df3dc34921d79e5f22e5d43a

SHA-1

59056c7c6faa5eeb7151d30a01c17b25f35b021c

SHA-256

84ca63076a5966e9b726490c8b6a5c9c6d6bdc018bb0a05df754c0c2770aca72

HMAC

ed029027322fedb0ac40b7759ac1521f0121cb018cf0f6f078e61764d810e00f

Redaction masking

Applicable types and typical scenarios

Algorithm description

Algorithm configuration example

Example Data Before Desensitization

Masked result example

Irreversible algorithm.

Uses special characters like asterisks (*) or number signs (#) to redact parts of text and mask sensitive data.

Useful for frontend display or sharing sensitive data.

  • Sensitive type: Personally sensitive.

  • Scenarios:

    • Data usage.

    • Data sharing.

Keep first n and last m characters

Redaction character: *, n=1, m=1

123456

1****6

Keep characters from position x to y

Redaction character: *, x=3, y=4

34

Redact first n and last m characters

Redaction character: *, n=2, m=2

34

Redact characters from position x to y

Redaction character: *, x=2, y=5

1****6

Redact before a special character (@, &, .) at its first occurrence

&

1@34&6

****&6

Redact after a special character (@, &, .) at its first occurrence

@

1@****

Substitution masking

The data to mask is replaced based on a random lookup table defined in the algorithm configuration. Each masking operation produces different results. For example, a phone number might be randomly replaced. Masking “1390000**” could produce results such as “1327156 ”, “1835537 ”, or “1885654**”.

Applicable types and typical scenarios

Algorithm description

Algorithm configuration

Partially reversible algorithm.

Uses a substitution lookup table for mapping (reversible) or a random range for substitution (irreversible) to mask entire fields or parts of them.

DSC provides multiple built-in lookup tables and supports custom substitution algorithms.

Useful for masking fields with fixed formats, such as ID numbers.

  • Sensitive types:

    • Personally sensitive.

    • Enterprise sensitive.

    • Device sensitive.

  • Scenarios:

    • Data storage.

    • Data sharing.

ID card mapping substitution

Administrative region random lookup table

ID card random substitution

Administrative region random lookup table

Military ID random substitution

Administrative region random lookup table

Passport random substitution

Purpose field random code

Hong Kong/Macau travel permit random substitution

Purpose field random code

Bank card random substitution

Bin code random lookup table

Landline phone number random substitution

Administrative region random lookup table

Mobile phone number random substitution

Network number

Unified social credit code random substitution

Registration department random lookup table, category code random lookup table, administrative region random lookup table

General Bogue mapping substitution

Uppercase letter mapping code, lowercase letter mapping code, digit mapping code, special character mapping code

General Bogue random substitution

Uppercase letter random code, lowercase letter random code, digit random code, special character random code

Rounding masking

Applicable types and typical scenarios

Algorithm description

Algorithm configuration example

Example Data Before Desensitization

Masked result example

Partially reversible algorithm.

Provides two types of rounding masking: rounding numbers or dates (irreversible), and shifting text characters (reversible).

Useful for analysis and statistical scenarios involving sensitive datasets.

  • Sensitive type: General sensitive.

  • Scenarios:

    • Data storage.

    • Data usage.

Number rounding:

Keep digits up to the Nth place before the decimal point. N ranges from 1 to 19.

N=4

12345.6789

12000

Date rounding: Round dates to year, month, day, hour, or minute.

Hour

2023-04-15 14:30:45

2023-04-15 14:00:00

Bit shift: a global cyclic shift of bits, either left or right.

Shift left by 3 bits

test

ttes

Encryption masking

Applicable types and typical scenarios

Algorithm description

Algorithm configuration example

Original data example

Masked result example

Reversible algorithm.

Supports common symmetric encryption algorithms.

Useful for encrypting fields that require origin fetch.

  • Sensitive types:

    • Personally sensitive.

    • Enterprise sensitive.

  • Scenario: Data storage.

DES algorithm

Encryption key: 121212

123456

c2TwheTI+rw=

3DES algorithm

Encryption keys: 123, 1232131, 123123

XUwzslGadsk=

AES algorithm

Encryption key: 123131

YueDcm92UuqvKpVbeS+0Ng==

Shuffling masking

Applicable types and typical scenarios

Algorithm description

Algorithm configuration

Irreversible algorithm.

After extracting data from the source table and confirming the value range, this method performs column-level shuffling and random selection within the range to obfuscate the data.

Useful for column-level masking of structured data.

  • Sensitive types:

    • Device sensitive.

    • Location sensitive.

  • Scenario: Data storage.

Random shuffling

  • Shuffle and reorder

  • Random selection

For example, shuffling city information for a group of devices:

Data before desensitization

Masked data

Device ID

City

Device ID

City

D001

Shanghai

D001

Xi'an

D002

Hangzhou

D002

Shanghai

D003

Xi'an

D003

Chengdu

D004

Chengdu

D004

Hangzhou

Billing

Only Enterprise Edition DSC instances support data masking. After purchasing an Enterprise Edition instance, you can use the data masking feature. DSC uses subscription billing. For details, see Billing. Static data masking may incur additional charges.

Masking method

Data source

DSC billing

Additional charges

Static data masking

  • RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, OceanBase tables, AnalyticDB-MySQL tables, and self-managed database tables on ECS.

  • Structured TXT, CSV, XLSX, and XLS files in OSS buckets.

Data assets to mask must be authorized and connected to DSC. This consumes your purchased database protection instances and storage protection capacity.

If the cloud services you mask use pay-as-you-go billing, those services charge fees based on data access or write volume.

Structured TXT, CSV, XLSX, and XLS files saved on your local computer.

Does not consume instance resources.

No additional charges.

Dynamic data masking

Your own constructed data.

Does not consume instance resources.

No additional charges.

Activate the service

Important

If you use static data masking, authorize and connect the relevant data assets to DSC first. Ensure you have enough available database protection instances and OSS protection capacity.

Static data masking

Feature overview

When creating a static masking task, you can either select an existing masking template as the task's masking rule or directly set the masking algorithm for the target sensitive fields. For masking template configuration, see Configure masking templates and algorithms.

Prerequisites

If you use static masking for databases or OSS files, complete DSC authorization and connect the data assets to mask. For steps, see:

Important

To store masked data in RDS, PolarDB-X, MaxCompute, PolarDB, OceanBase, AnalyticDB-MySQL, self-managed database tables, or OSS buckets, DSC must authorize and connect to the destination data asset. Use an account with read and write permission to connect. For RDS, PolarDB-X, PolarDB, OceanBase, or AnalyticDB-MySQL databases, use account-password authentication to connect DSC.

Create a masking task

Warning

If you mask data directly in a production environment, database performance may decrease.

Create a masking task to define the scope and rules for data masking.

  1. Log on to the Data Security Center console.

  2. In the navigation pane on the left, select Risk Governance > Configuration Risks.

  3. On the Static Desensitization tab, go to the Task Configurations tab and click Add Desensitization Task.

  4. Follow the page prompts to configure the masking task.

    1. Enter Basic Task Information and click Next.

      Note

      Task names have no restrictions.

    2. Configure the source file information for masking and click Next.

      RDS tables / PolarDB-X tables / MaxCompute tables / PolarDB tables / OceanBase tables / AnalyticDB-MySQL tables / Self-managed database tables

      Masking Source Configuration Item

      Required

      Description

      Types of data storage

      Yes

      Select RDS tables / PolarDB-X tables / MaxCompute tables / PolarDB tables / OceanBase tables / AnalyticDB-MySQL tables / Self-managed database tables.

      Source Service

      Yes

      Select the product that contains the data to mask. Supported products include the following: RDS, PolarDB-X, OceanBase, MaxCompute, AnalyticDB-MySQL, PolarDB, or Self-managed database.

      Source Database/Project

      Yes

      Select the project that contains the table with data to mask.

      SOURCE table name

      Yes

      Select the table that contains the data to mask.

      Source Partition

      No

      You can configure the Source Partition only when the Source Service is set to MaxCompute.

      Enter the partition name for the data to mask. Leaving this blank masks all sensitive data in the table.

      Partitions are spaces defined when creating MaxCompute tables to limit data by region, enabling fast and efficient queries. For more information, see Partition.

      Sample SQL

      No

      Source Service is set to RDS, PolarDB-X, OceanBase, or self-managed database, you can configure Sample SQL.

      Enter an SQL statement to define the data scope for masking. Leaving this blank masks the entire table.

      OSS files

      Important

      Supports only desensitized structured files in TXT, CSV, XLSX, and XLS formats.

      Masking Source Configuration Item

      Required

      Description

      Types of data storage

      Yes

      Select OSS files.

      File source

      Yes

      Select the OSS file source. Options are Uploaded Local File and OSS bucket.

      Upload file

      Yes

      When File source is Uploaded Local File, click Select a local file to upload the file to mask.

      OSS Bucket where the source file is located

      Yes

      When File source is OSS bucket, select the bucket containing the source file from the dropdown list. You can also enter keywords to search and select the bucket.

      Source file name

      Yes

      When File source is OSS bucket, enter the source file name including its extension.

      • Single-file masking: Enter the exact file name. Example: test.csv.

      • Batch file masking: Enable wildcard mode (click the Open the pass switch). The system applies the same rule to all target files, which must share the same format and column structure.

        Use an asterisk (*) to specify a batch of files for masking. Currently, only filename prefixes are supported. Example: test*.xls matches all XLS files starting with test.

      Source file description (optional)

      No

      When File source is Uploaded Local File, you can enter a description for the OSS source file.

      Separator selection

      No

      For CSV and TXT files, specify the column delimiter based on your source file. Supported delimiters include the following:

      • Semicolon “;” (default for macOS and Linux).

      • Comma “,” (default for Windows).

      • Vertical bar “|”.

      Table contains header rows

      No

      Select based on whether your source file includes a header row.

    3. Configure the masking algorithm and click Next.

      Available masking algorithms include redaction masking (such as keep first n and last m), shuffling masking (such as random shuffling), and substitution masking (such as ID card mapping substitution). Choose an algorithm based on the field type (name, phone number, ID number, etc.).

      • Select an existing masking template above the data list. The source field list automatically enables the Desensitization and sets the Select Algorithm based on the template.

        The rule list in the masking template must match the source fields of the data to mask. Otherwise, the template does not take effect. For masking template configuration, see Configure masking templates and algorithms.

      • In the source field list, locate the field to mask, enable the Desensitization, and set the Select Algorithm.

      Click View and Modify Parameters next to the masking algorithm to view and edit its rules. For partition syntax, refer to the Partition syntax reference table.

      Note

      If you enable Forcefully Enable Template, you cannot modify algorithm parameters on this page. Modify the template rules instead.

      Partition syntax reference table

      Partition type

      Partition syntax

      Partition example

      Next N weeks

      Custom partition field name=$[yyyymmdd+7*N]

      time=$[20190710+7*1] masks data from one week after July 10, 2019.

      Previous N weeks

      Custom partition field name=$[yyyymmdd-7*N]

      time=$[20190710-7*3] masks data from three weeks before July 10, 2019.

      Next N days

      Custom partition field name=$[yyyymmdd+N]

      time=$[20190710+2] masks data for the two days following July 10, 2019.

      Previous N days

      Custom partition field name=$[yyyymmdd-N]

      time=$[20190710-5] masks data from five days before July 10, 2019.

      Next N hours

      Custom partition field name=$[hh24mi:ss+N/24]

      time=$[0924mi:ss+2/24] masks data from two hours after 9:00 AM.

      Previous N hours

      Custom partition field name=$[hh24mi:ss-N/24]

      time=$[0924mi:ss-1/24] masks data from one hour before 9:00 AM.

      Next N minutes

      Custom partition field name=$[hh24mi:ss+N/24/60]

      time=$[0924mi:ss+2/24/60] masks data from two minutes after 9:00 AM.

      Previous N minutes

      Custom partition field name=$[hh24mi:ss-N/24/60]

      time=$[0924mi:ss-2/24/60] masks data from two minutes before 9:00 AM.

  5. Set the destination for storing masked data, click Test to confirm write permissions, then click Next.

    Important

    The account DSC uses to connect to the destination data asset must have write permission.

    Configuration items include Data storage type (choose RDS tables / PolarDB-X tables / MaxCompute tables / PolarDB tables / OceanBase tables / ADB-MySQL tables / Self-managed database tables or OSS files), Destination (such as RDS), Destination database/project name, and destination table name. If you cannot find a database or table, check asset authorization to ensure masking is enabled.

  6. Confirm the processing logic.

    Destination configuration item

    Required

    Description

    Trigger method

    Yes

    The trigger method determines how the masking task runs. Options include the following:

    • Manual Only: Start the masking task manually.

    • Scheduled Only: Run the masking task automatically at scheduled times (hourly, daily, monthly, or weekly).

    • Manual + Scheduled: Start the task manually by clicking Start, or let the system run it automatically at scheduled times (hourly, daily, monthly, or weekly).

    Enable incremental masking

    No

    Enable incremental masking if needed. Incremental masking processes only new data added since the last masking task. Select a field that increments over time as the increment column, such as creation time or auto-increment ID (the database’s built-in auto-increment column).

    Important

    Only RDS data supports incremental masking.

    Shard field

    No

    DSC shards source data by field during static masking to improve efficiency through concurrent processing. Select shard fields as needed. Multiple fields are supported.

    • Only RDS databases support incremental masking. Use the primary key or unique index as the shard field.

    • If no field is selected in the sharding field selection box, DSC uses the primary key as the default sharding field to mask the source data.

      Important

      If your source data has no primary key, you must select a shard field. Otherwise, the masking task fails.

    • Too many shard fields can affect query performance and data accuracy. Choose carefully.

    Table name conflict resolution

    Yes

    How to handle table name conflicts. Options:

    • Delete target table and create a new table with the same name.

    • Add new data to the target table. We recommend this option.

    Row conflict resolution

    Yes

    How to handle row conflicts in the table. Options:

    • Keep conflicting rows in the target table and discard new data. We recommend this option.

    • Delete conflicting rows from the target table and insert new data.

  7. Click Submit.

Run and view masking tasks

If the trigger method is Manual Only, you must start the task manually. If it is Scheduled Only, the task runs automatically at scheduled times. If it is Manual + Scheduled, you can start it manually or automatically.

  1. On the Static Desensitization tab under Task Configurations, click Start in the Actions column to execute the task.

    The task list shows task ID, task name, creation time, destination product, source product, executions, and more. The Actions column includes toggle, Delete, Edit, and Run buttons. These buttons are disabled while a task is running.

  2. On the Static Desensitization tab, click the Status sub-tab to view execution progress and status.

    The task status list includes columns for Task ID, Running time, End time, Execution count, Execution method, Source product type, Destination product type, and Execution progress. The Actions column provides links to Stop task, View subtasks, and Download masked file.

Troubleshoot failed masking tasks

If a masking task fails, refer to the following to identify the cause.

Error message

Cause

Masking task not found. It may have been deleted or disabled.

The masking task was deleted or disabled (the toggle in the Actions column is off).

Scheduled task scheduling cycle is invalid.

Daily execution time is invalid.

Source instance for masking not found.

The instance containing the source table does not exist.

Destination instance for masking not found.

Possible causes include canceled instance authorization or deleted destination instance.

Source table for masking not found.

Possible causes include canceled instance authorization or deleted source table.

Masking algorithm parameters are invalid.

Algorithm parameters are incorrect.

Source table column is empty.

The source partition field column has no data.

Failed to write to destination table.

Writing to the destination table failed during destination configuration.

Failed to query source table.

The data was not found in the source table.

Failed to create destination table.

The table may not exist in the destination.

Primary key not found.

The RDS source table lacks a primary key.

MaxCompute partition field in task configuration is invalid.

The source partition in source configuration or destination partition in destination configuration is invalid.

Edit or delete masking tasks

You cannot edit or delete masking tasks that are waiting to run or currently running.

  • Edit a masking task

    To adjust task settings, click Modify in the Actions column for the target task.

  • Delete a masking task

    Important

    Deleted masking tasks cannot be restored. Proceed with caution.

    To delete a task you no longer need, click Delete in the Actions column for the target task, then click OK in the confirmation dialog box.

Dynamic data masking

Feature overview

Dynamic masking tasks require an existing masking template as the masking rule to mask specified data. Call the ExecDatamask API, pass the data to mask (Data) and the masking template ID (TemplateId), then apply the masking template's Matching mode (Field name or Sensitive type) to mask the dataList data in Data.

In the Data Security Center console, go to Risk Governance > Data Desensitization, then the Masking Configurations tab to get the masking template ID. Custom masking templates are supported. For steps, see Configure masking templates.

Masking template methods for the ExecDatamask API:

Matching method

Masking description

Field name

Match column names in dataHeaderList with field names and algorithms in the masking template's Rule list to mask corresponding columns in dataList.

Sensitive type

Match rule IDs for sensitive types in ruleList with rule identifier IDs in the masking template's Rule list. Apply the rule's field name and algorithm to mask corresponding columns in dataList.

Rule fields for sensitive types come from the Data features page in Data Insight, which includes built-in and custom identification features. These feature names also serve as rule names in the Rule list. Call the DescribeRules API, pass CustomType (rule type, i.e., Data feature Source: built-in or custom) and Name (rule name, i.e., Data feature name) to get the rule's identifier ID (Id).

Masking example diagram:

Limits

When calling the ExecDatamask API for dynamic masking, each request's data (Data) must be smaller than 2 MB.

View dynamic masking API call records

  1. Log on to the Data Security Center console.

  2. In the navigation pane on the left, select Risk Governance > Configuration Risks.

  3. On the Data Desensitization page, click the Dynamic desensitization tab.

  4. On the Dynamic desensitization page, view ExecDatamask API call records.

    Note

    If you call the API multiple times using the same account and IP address, the operation log shows only one entry and records the Cumulative number of calls.

Example: Use static masking for data sharing

Use static masking to mask sensitive data in a structured CSV file in a source OSS bucket under your account, then save the masked file to a destination OSS bucket under the same account. Share the destination OSS bucket with specific users to achieve secure data sharing. For steps, see Mask sensitive data in OSS table files.