Dynamic data masking

更新时间:
复制 MD 格式

You can enable dynamic data masking in MaxCompute to show users sensitive data with key information masked. This feature hides or replaces sensitive data on the fly during access, which prevents its exposure.

Introduction

MaxCompute provides dynamic data masking to protect sensitive data, such as personally identifiable information (PII), in scenarios like business development and testing, data sharing, and O&M. Unlike access control with column-level permissions, which requires users to modify their queries to exclude inaccessible columns, dynamic data masking works automatically without changes to existing queries. When a user accesses data, the system automatically applies the appropriate masking policy based on the user and their role. This ensures that data is masked during queries, downloads, joins, and user-defined function (UDF) computations, which reduces the risk of sensitive data exposure.

Masking policies support various methods, including masking, hashing, character replacement, rounding, and date truncation, for various types of data, such as identification numbers, bank card numbers, addresses, and phone numbers. MaxCompute applies data masking at the earliest stage of data retrieval from storage, ensuring high performance and security.

image

Scope

  • Supported regions

    This feature is in public preview and is available only in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

  • Supported driver versions

    Connection method

    Driver version

    Masking support

    Java SDK

    0.48.0-public or later

    Supported

    odpscmd

    0.47.1 or later

    Supported

    JDBC

    3.4.3 or later

    Supported

    MaxFrame

    All versions

    Supported

    PyODPS

    All versions

    Supported

    Go SDK

    All versions

    Supported

  • Internal and external tables

    • Data masking policies are supported for MaxCompute internal and external tables.

    • Dynamic data masking and row-level permissions are mutually exclusive. You cannot configure row-level permissions for a table that already has a masking policy, and vice versa.

    • When you apply a masking policy to Chinese characters, the character encoding must be UTF-8.

  • Views

    • Standard views support data masking policies. The policies on a view are synchronized with those on the source table. When a masking policy is applied to or removed from the source table, the change takes effect on the view immediately.

    • A materialized view inherits the masking policies from its source table at the time of creation. Subsequent changes to the source table's policies do not affect the materialized view.

  • Masking policies

    If multiple masking policies apply to a user's access, the policy with the highest priority is applied. For more information, see Predefined masking policy priorities.

How it works

A project owner or users with the Super_Administrator or Admin role can manage masking policies. When a user accesses a table that contains sensitive data, the system checks for masking policies associated with the user or their roles and returns either masked data or plaintext data accordingly.

image

Commands

Enable or disable data masking

Theodps.data.masking.policy.enable property is a project-level setting that controls the dynamic data masking feature. Only the project owner or users with the project-level Super_Administrator or Admin role can configure this property. For more information, see Assign built-in management roles to users.

After you enable or disable data masking, the change takes approximately 15 minutes to take effect due to cache refresh latency.

  • Enable dynamic data masking for the project.

    setproject odps.data.masking.policy.enable=true;
  • Disable dynamic data masking for the project.

    setproject odps.data.masking.policy.enable=false;

Create and drop masking policies

A project can have up to 1,000 masking policies.

  • Syntax

    • Create a masking policy.

      CREATE DATA MASKING POLICY [IF NOT EXISTS] <policy_name> 
      TO { USER <user_list> | ROLE <role_list> | default } 
      USING <Predefined Masking Policy>;
    • Drop a masking policy.

      DROP DATA MASKING POLICY <policy_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy. The policy name is case-insensitive and can contain only letters, digits, and underscores (_). Start the name with a letter. The name can be up to 128 bytes in length.

    USER | ROLE | default

    Yes

    The scope of the policy. You must select one of the following options:

    • USER: Applies the policy to one or more users. For <user_list>, specify the target usernames. You can run the list users; command in MaxCompute to view user information.

    • ROLE: Applies the policy to one or more roles. For <role_list>, specify the names of the target roles. You can run the list roles; command in MaxCompute to view role information.

    • default: Applies a default policy. If no specific masking policy matches a user or role when they access a sensitive column, the default policy is applied.

    Predefined Masking Policy

    Yes

    A predefined masking policy. For more information, see Predefined masking policies.

  • Examples

    • Example 1: Create an MD5 hashing policy for users userA, userB, and userC.

      CREATE data masking policy IF NOT EXISTS masking_test_001
      TO USER (userA, userB, userC)
      USING MASKED_MD5(0);
    • Example 2: Create an MD5 hashing policy for the project development and deployment roles.

      CREATE data masking policy IF NOT EXISTS masking_test_001
      TO ROLE (role_project_deploy, role_project_dev)
      USING MASKED_MD5(0);

Apply a masking policy

  • Syntax

    --Apply a masking policy to a table column.
    APPLY DATA MASKING POLICY <policy_name> BIND TO TABLE <table_name> COLUMN <column_name>;
    
    --Unbind a masking policy from a table column.
    APPLY DATA MASKING POLICY <policy_name> UNBIND FROM TABLE <table_name> COLUMN <column_name>;
    
    --Unbind all masking policies from a table column.
    APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name> COLUMN <column_name>;
    
    --Unbind all masking policies from a table.
    APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy.

    table_name

    Yes

    The name of the table that contains sensitive data.

    column_name

    Yes

    The name of the column that contains sensitive data.

View masking policies

  • Syntax

    --View the details of a masking policy.
    DESC DATA MASKING POLICY <policy_name>;
    
    --View extended table information, including applied masking policies.
    DESC EXTENDED <table_name>;
    
    --Display the names of all masking policies in the current project.
    LIST DATA MASKING POLICY;
    
    --Display the names of masking policies bound to a specific user.
    LIST DATA MASKING POLICY TO USER <user_name>;
    
    --Display the names of masking policies bound to a specific role.
    LIST DATA MASKING POLICY TO ROLE <role_name>;
    
    --Display the names of all masking policies bound to a specific table.
    LIST DATA MASKING POLICY ON <table_name>;
    
    --Display the names of all masking policies bound to a specific column of a table.
    LIST DATA MASKING POLICY ON <table_name> TO COLUMN <column_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy.

    table_name

    Yes

    The name of the table that contains sensitive data.

    column_name

    Yes

    The name of the column that contains sensitive data.

    user_name

    Yes

    The username.

    role_name

    Yes

    The role name.

Predefined masking policies

Predefined masking policies include methods such as masking, hashing, character replacement, and rounding. You can select the appropriate policy based on the data type and your protection requirements.

Category

Policy

Syntax

Description

General

No masking

UNMASKED

Returns data in plaintext.

Supported data types: All.

Nullify

MASKED_NULLIFY

Replaces the data with NULL.

  • Supported data types: All.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    
    -- Masking policy
    MASKED_NULLIFY
    
    -- After masking
    +------------+
    | col_string | 
    +------------+
    | NULL       | 
    +------------+

Default value

MASKED_DV

Replaces the value with the default value of the corresponding data type. For more information, see Default values for the MASKED_DV masking policy.

  • Supported data types: All.

  • Example

    -- Before masking (data type: TIMESTAMP)
    +---------------------+
    | col_timestamp       |
    +---------------------+
    | 2024-05-01 11:12:13 |
    +---------------------+
    -- Masking policy
    MASKED_DV
    -- After masking (The default project time zone is China Standard Time, which is UTC+8.)
    +---------------------+
    | col_timestamp       | 
    +---------------------+
    | 1970-01-01 08:00:00 | 
    +---------------------+

Date truncation

MASKED_DATE_YEAR

Retains only the year part of a time value and resets the month, day, and time to the beginning of that year (January 1, 00:00:00 UTC).

  • Supported data types: DATE, DATETIME, TIMESTAMP_NTZ, and TIMESTAMP.

  • Example

    -- Before masking (data type: TIMESTAMP)
    +---------------------+
    | col_timestamp       |
    +---------------------+
    | 2024-05-01 11:12:13 |
    +---------------------+
    -- Masking policy
    MASKED_DATE_YEAR
    -- After masking (The default project time zone is China Standard Time, which is UTC+8.)
    +---------------------+
    | col_timestamp       | 
    +---------------------+
    | 2024-01-01 08:00:00 | 
    +---------------------+

Rounding

MASKED_POINT_RESERVE(<num>)

Rounds a value to a specified number of decimal places.

  • num: The number of decimal places to retain. The value must be an integer in the range of [0, 5].

  • Supported data types: DECIMAL, FLOAT, and DOUBLE.

  • Example

    -- Before masking (data type: FLOAT)
    +-----------+
    | col_float |
    +-----------+
    | 1.12345   |
    +-----------+
    -- Masking policy: retain 2 decimal places.
    MASKED_POINT_RESERVE(2)
    -- After masking
    +-----------+
    | col_float | 
    +-----------+
    | 1.12      | 
    +-----------+

Masking

Mask start and end

MASKED_STRING_MASKED_BA(<before>, <after>)

Replaces the start and end of the string with asterisks (*), leaving the middle part in plaintext.

  • before: The number of characters to replace with * at the beginning of a string. The value must be an integer that is greater than or equal to 0. If you set this parameter to 0, the beginning of the string is not masked.

  • after: The end of the string is replaced with *. The after parameter specifies the number of characters to mask and must be an integer greater than or equal to 0.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    -- Masking policy: replace the first 3 and last 3 characters with asterisks.
    MASKED_STRING_MASKED_BA(3, 3)
    -- After masking
    +-----------------+
    | col_string      | 
    +-----------------+
    | ***hael John*** | 
    +-----------------+

Mask middle

MASKED_STRING_UNMASKED_BA(<before>, <after>)

Displays the beginning and end of the string in plaintext and masks the middle part with asterisks (*).

  • before: The number of characters to display in plaintext at the start of the string. Must be an integer of 0 or greater.

  • after: The number of characters to display in plaintext at the end of the string. Must be an integer of 0 or greater.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    -- Masking policy: display the first and last characters in plaintext.
    MASKED_STRING_UNMASKED_BA(1, 1)
    -- After masking
    +-----------------+
    | col_string      | 
    +-----------------+
    | M*************n | 
    +-----------------+

Hashing

SHA256 hashing

MASKED_SHA256(<salt>)

Masks data by using the SHA256 hashing algorithm.

  • salt: The salt for hashing. The value must be an integer from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +---------------------+
    | col_string          | 
    +---------------------+
    | 4562-1234-5678-9123 | 
    +---------------------+
    -- Masking policy
    MASKED_SHA256(0)
    -- After masking
    +----------------------------------------------+
    | col_string                                   | 
    +----------------------------------------------+
    | zwGMB1aCF1t705EfcwdDorql4MZb46XBqQJw/2RVx8U= | 
    +----------------------------------------------+

SHA512 hashing

MASKED_SHA512(<salt>)

Masks data by using the SHA512 hashing algorithm.

  • salt: The salt for hashing. The value must be an integer from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +---------------------+
    | col_string          | 
    +---------------------+
    | 4562-1234-5678-9123 | 
    +---------------------+
    -- Masking policy
    MASKED_SHA512(0)
    -- After masking
    +------------------------------------------------------------------------------------------+
    | col_string                                                                               | 
    +------------------------------------------------------------------------------------------+
    | 3PPywfEIp08WuTUI8FZCCfdVuRu68wZTVwWWVAf4pboACUnH6w9kFMLpl2AARaGW/mvWvg26p0EIqmE0fAEiuA== | 
    +------------------------------------------------------------------------------------------+

MD5 hashing

MASKED_MD5(<salt>)

Masks data by using the MD5 hashing algorithm.

  • salt: The salt for hashing. The value must be an integer from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +---------------------+
    | col_string          | 
    +---------------------+
    | 4562-1234-5678-9123 | 
    +---------------------+
    -- Masking policy
    MASKED_MD5(0)
    -- After masking
    +--------------------------+
    | col_string               | 
    +--------------------------+
    | mK/o08tew5g7S3XV/BkFfw== | 
    +--------------------------+

SM3 hashing

MASKED_SM3(<salt>)

Masks data by using the SM3 hashing algorithm.

  • salt: The salt for hashing. The value must be an integer from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +---------------------+
    | col_string          | 
    +---------------------+
    | 4562-1234-5678-9123 | 
    +---------------------+
    -- Masking policy
    MASKED_SM3(0)
    -- After masking
    +----------------------------------------------+
    | col_string                                   | 
    +----------------------------------------------+
    | Q2TfwUh4B8QQH8jPL6DfdoGysx/CXBxn2T14dDwQtQw= | 
    +----------------------------------------------+

Character replacement

Random replacement

MASKED_REPLACE_RANDOM(<position>)

Replaces data with random characters that consist of digits and letters. The length of the string remains unchanged.

  • position: An integer that specifies the position where characters are replaced.

    • position = 0: The entire string is replaced with random characters.

    • position > 0: The first 'position' characters are replaced with random characters.

    • position < 0: The last N characters are replaced with random characters, where N is the absolute value of 'position'.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    -- Masking policy: replace the first seven characters with random characters.
    MASKED_REPLACE_RANDOM(7)
    -- After masking
    +-----------------+
    | col_string      | 
    +-----------------+
    | 4DlJQxi Johnson | 
    +-----------------+

Random replacement at start and end

MASKED_REPLACE_RANDOM_BA(<before>, <after>)

Replaces the start and end of a string with random alphanumeric characters. The string length remains unchanged.

  • before: The number of characters to replace at the start of the string. Must be an integer of 0 or greater. If this value is 0, no characters are replaced at the start.

  • after: The number of characters to replace at the end of the string. Must be an integer of 0 or greater.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    -- Masking policy: replace the first 4 and last 4 characters with random characters.
    MASKED_REPLACE_RANDOM_BA(4, 4)
    -- After masking
    +-----------------+
    | col_string      | 
    +-----------------+
    | r0xEael JohnWNr | 
    +-----------------+

Fixed replacement

MASKED_REPLACE_FIXED(<position>, <fixed_string>)

  • Replaces data with a fixed string.

  • position: An integer that specifies the position where characters are replaced.

    • position = 0: The entire string is replaced with the fixed_string.

    • position > 0: The first 'position' characters are replaced with the fixed_string.

    • position < 0: The last N characters are replaced with the fixed_string, where N is the absolute value of 'position'.

  • fixed_string: The replacement string. It can be up to 100 characters long and cannot contain spaces.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Example

    -- Before masking (data type: STRING)
    +-----------------+
    | col_string      |
    +-----------------+
    | Michael Johnson |
    +-----------------+
    -- Masking policy: replace the first 7 characters with the fixed string "Oli".
    MASKED_REPLACE_FIXED(7, "Oli")
    -- After masking
    +-------------+
    | col_string  | 
    +-------------+
    | Oli Johnson | 
    +-------------+

Usage examples

Mask sensitive personal information

This example shows how to configure masking policies to mask sensitive personal information.

  1. Prepare the data.

    Create a table to store personal information and insert sensitive data.

    -- Create a table for sensitive information.
    CREATE TABLE if NOT EXISTS personal_info (
     id bigint COMMENT 'The unique ID of the user.',
     name string COMMENT 'The name of the user.',
     age int COMMENT 'The age of the user.',
     gender string COMMENT 'The gender of the user.',
     height float COMMENT 'The height of the user.',
     birthday date COMMENT 'The birth date of the user.',
     phone_number string COMMENT 'The phone number of the user.',
     email string COMMENT 'The email address of the user.',
     address string COMMENT 'The address of the user.',
     salary decimal(18, 2) COMMENT 'The salary of the user.',
     create_time timestamp COMMENT 'The time when the user information was created.',
     update_time timestamp COMMENT 'The time when the user information was updated.',
     is_deleted boolean COMMENT 'Indicates whether the user information is deleted.'
    );
    -- Insert sensitive data.
    INSERT INTO personal_info VALUES
     (1, 'Zhang San', 18, 'Male', 178.56, '1990-01-01', '13800000000', 'zhangsan@example.com', 'Haidian District, Beijing', 5000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false),
     (2, 'Li Si', 20, 'Female', 162.70, '1992-02-02', '13900000000', 'lisi@example.com', 'Pudong New Area, Shanghai', 6000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00',false),
     (3, 'Wang Wu', 22, 'Male', 185.21, '1994-03-03', '14000000000', 'wangwu@example.com', 'Nanshan District, Shenzhen', 7000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false);
  2. Configure masking policies.

    • For the name, keep only the first character and replace the rest with asterisks (*).

      CREATE data masking policy IF NOT EXISTS masking_name
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_STRING_UNMASKED_BA(1, 0);
      
      apply data masking policy masking_name bind TO
      TABLE personal_info COLUMN name;
    • Round the height value to the nearest integer.

      CREATE data masking policy IF NOT EXISTS masking_height
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_POINT_RESERVE(0);
      
      apply data masking policy masking_height bind TO
      TABLE personal_info COLUMN height;
    • For the birth date, the value is reset to the beginning of the year.

      CREATE data masking policy IF NOT EXISTS masking_birthday
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_DATE_YEAR;
      
      apply data masking policy masking_birthday bind TO
      TABLE personal_info COLUMN birthday;
    • For default users, hash phone numbers by using the SM3 algorithm.

      CREATE DATA MASKING POLICY default_sm3
      TO DEFAULT
      USING MASKED_SM3(0);
      
      apply data masking policy default_sm3 bind TO
      TABLE personal_info COLUMN phone_number;
  3. Use an account for which masking policies are configured to query the data.

    SELECT id, name, height, birthday, phone_number FROM personal_info;
    
    -- Before masking
    +----+-----------+--------+------------+--------------+
    | id | name      | height | birthday   | phone_number |
    +----+-----------+--------+------------+--------------+
    | 1  | Zhang San | 178.56 | 1990-01-01 | 13800000000  |
    | 2  | Li Si     | 162.7  | 1992-02-02 | 13900000000  |
    | 3  | Wang Wu   | 185.21 | 1994-03-03 | 14000000000  |
    +----+-----------+--------+------------+--------------+
    
    -- After masking
    +----+---------+--------+------------+----------------------------------------------+
    | id | name    | height | birthday   | phone_number                                 | 
    +----+---------+--------+------------+----------------------------------------------+
    | 1  | Z******** | 179    | 1990-01-01 | lvYJaH4ElL2ilpQx/8tfMUw7xP22yblIgmfWp0/msUQ= | 
    | 2  | L****     | 163    | 1992-01-01 | 9fFWacNSwCRZLAjMHqunlfwkqhTbP2ubuDOeOSh4N1c= | 
    | 3  | W******   | 185    | 1994-01-01 | k/0JoQCSarJg9ATJ5tyVnhQf1jIBxHXRbB+cvUm4OmE= | 
    +----+---------+--------+------------+----------------------------------------------+

Default masking for all users and roles

This example shows how policy priority works when a user or role matches multiple masking policies.

Apply the MASKED_SHA256(5) policy to default users.

CREATE DATA MASKING POLICY default_hash_policy
TO DEFAULT
USING MASKED_SHA256(5);

Apply the UNMASKED policy to specific users A and B.

CREATE DATA MASKING POLICY ab_unmask_policy
TO USER (A, B)
USING UNMASKED;

Result: Users A and B can access plaintext data. Other users can access only the SHA256-hashed data.

Note

Users A and B match both the MASKED_SHA256(5) and UNMASKED policies. The system applies UNMASKED because it has a higher priority. For more information, see Predefined masking policy priorities. Other users match only the MASKED_SHA256(5) policy.

Appendix

Predefined masking policy priorities

When multiple masking policies apply to a user's data access, the one with the highest priority is used.

For example, a user named A accesses thecol_string column, and their request matches two masking policies: MASKED_REPLACE_RANDOM(3) with priority level 3 and MASKED_SM3 with priority level 4. Because a lower number indicates higher priority, the MASKED_REPLACE_RANDOM(3) policy is applied. User A sees the data masked with random characters.

Priority

Predefined masking policy

0 (highest)

UNMASKED

1

MASKED_POINT_RESERVE(num)

2

MASKED_DATE_YEAR

3

MASKED_STRING_MASKED_BA(before, after)

MASKED_STRING_UNMASKED_BA(before, after)

MASKED_REPLACE_RANDOM(position)

MASKED_REPLACE_RANDOM_BA(before, after)

MASKED_REPLACE_FIXED(position, fixed_string)

4

MASKED_SHA256

MASKED_SHA512

MASKED_MD5

MASKED_SM3

5

MASKED_DV

6 (lowest)

MASKED_NULLIFY

Default values for the MASKED_DV masking policy

Type

Default

bigint

0

double

0.0

decimal

0

string

"" (empty string)

datetime

DATETIME'1970-01-01 00:00:00' (UTC)

boolean

false

tinyint

0

smallint

0

int

0

binary

'' (empty)

float

0.0

varchar(n)

"" (empty string)

char(n)

" " (n spaces)

date

DATE'1970-01-01'

timestamp

TIMESTAMP'1970-01-01 00:00:00' (UTC)

timestamp_ntz

TIMESTAMP'1970-01-01 00:00:00' (UTC)

array

Empty array

map

Empty map

json

"" (empty string)

struct

A struct with default values for each field