Mask sensitive data in OSS tabular files

更新时间:
复制 MD 格式

Use the Data Security Center (DSC) feature of Data Security Center (DSC) to mask sensitive data in structured files, such as TXT, CSV, XLSX, and XLS files, that are stored in a source OSS bucket. The masked files are then saved to a destination OSS bucket in the same account to enable secure data sharing.

Overview

The following table shows an example of masked data.

Raw data

Masked data

Name

Mobile phone number

ID card number

Name

Mobile phone number

ID card number

Zhang Sansan

13900001234

111222190002309000

Zhang**

139****1234

111###########9000

Li Sisi

13900001111

150802202207214000

Li**

139****1111

150###########4000

Wang Wuwu

13900002222

120105195001066000

Wang**

139****2222

120###########6000

To achieve this result, follow these four steps:

  1. Create OSS buckets and upload a file: Create a source bucket and a destination bucket, and then upload a tabular file containing sensitive data to the source bucket.

  2. Authorize DSC to access the OSS buckets: Grant DSC permissions to read data from and write data to the OSS buckets.

  3. Create a data masking task: Create a task to configure the masking algorithm and rules for sensitive fields in the source file and specify the storage location for the masked file.

  4. Start the data masking task: Run the task to mask sensitive data in the tabular file stored in the source bucket and save the masked file to the destination bucket.

image

Prerequisites

Step 1: Create OSS buckets and upload a file

1.1 Create the source and destination OSS buckets

  1. In the Object Storage Service (OSS) console, go to the Buckets page and click Create Bucket.

  2. In the Create Bucket panel, configure the following parameters, leave the others at their defaults, and then click Create. This bucket will be used as the source bucket.

    For Region, select China (Hangzhou). For storage class, select Standard. For data redundancy type, select local redundant storage (LRS) (recommended). Keep Block Public Access set to Enabled. For Access Control List (ACL), select Private. For resource group, select default resource group.

  3. Repeat the preceding steps to create another bucket to use as the destination bucket.

1.2 Upload a tabular file to the source OSS bucket

  1. In the OSS console, go to the Buckets page and click the name of the source bucket.

  2. On the Files page, click Upload Object.

  3. Click Select Files, choose a local file (this tutorial uses the sample file userdata.csv, which contains sensitive information such as names, mobile phone numbers, and ID card numbers), and then click Upload Object. Wait for the upload to complete.

Step 2: Authorize DSC to access the OSS buckets

  1. Log on to the Data Security Center (DSC) console.

  2. In the left-side navigation pane, choose Asset Center.

  3. On the Asset Center page, click OSS in the Unstructured Data section on the left, and then click Asset Authorization Management.

  4. On the Asset Authorization Management page, click Asset synchronization.

  5. After the asset synchronization is complete, find the newly created OSS bucket and click Authorization in the Actions column.

Step 3: Create a data masking task

In the DSC console, go to the Data Masking page and click Add Desensitization Task. Follow the on-screen instructions to configure the task.

3.1 Configure the source file for masking

Enter a task name, and then configure the masking source as the sensitive file userdata.csv in the source OSS bucket. For csv files, you must specify the column separator as a comma. The sample file in this tutorial contains a header row.

3.2 Configure masking rules for sensitive fields

On the Data Masking Algorithm tab, the header row fields from userdata.csv are automatically displayed. This tutorial demonstrates how to apply Masking to the Name, Mobile phone number, and ID card number fields.

  1. Enable the data masking switch for each field and select Masking.

    For the Name field, select redaction > retain the first n and last m characters. For the Mobile phone number field, select redaction > redact characters x to y. For the ID card number field, select redaction > retain the first n and last m characters.

  2. Click View and Modify Parameters for Masking, configure the algorithm rules, and then click Save. This tutorial uses the following masking rules:

    • Name: Retain the first character and mask the rest with an asterisk (*).

    • Mobile phone number: Mask with an asterisk (*) from the 4th to the 7th character.

    • ID card number: Mask with a number sign (#), retaining the first 3 and last 4 characters.

3.3 Configure the masked file location

Watermarks are not supported for OSS data sources. Directly configure the destination bucket to store the masked file. In this tutorial, the file is saved as a Result set. You can customize the filename, but the file type must be csv, xls, or txt.

3.4 Configure the task trigger

For OSS file masking tasks, only the How the task is triggered (Required) setting applies.

  1. Set How the task is triggered (Required) to Manual Only.

  2. Click Submit.

Step 4: Start the data masking task

4.1 Run the task

  1. On the Task Configurations tab of the Static Desensitization page, find the newly created data masking task and click Start in the Actions column.

  2. On the Static Desensitization page, click the Status sub-tab. Wait until the task progress reaches 100% and the status changes to Successful.

4.2 Verify the results

  1. Go to the Buckets page in the OSS console, click the name of the destination bucket, and find the masked file in the file list. The filename is in the format <DestinationFilename>_<TaskExecutionTime>.<FileType>. For example, in usernews_20240808150643.csv, 20240808150643 indicates that the task was executed at 15:06:43 on August 8, 2024. You can click Download to obtain the file.

  2. After the download is complete, open the file to verify that the name, mobile phone number, and ID card number data is masked.

    The data is masked as follows: names retain only the first character (e.g., "Zhang**"), the middle four digits of mobile phone numbers are hidden (e.g., "139****1234"), and the middle digits of ID card numbers are replaced with number signs (e.g., "111###########9000").

Summary

You can mask raw data stored in an OSS bucket and then save it to a destination bucket for secure sharing. After data masking, sensitive content is not directly exposed, reducing the risk of data abuse and privacy violations even if the shared data is leaked. The masked data can be used for scenarios such as data analysis, model training, and business report sharing without compromising personal privacy.

Flexible masking algorithms

Data masking relies on algorithms and their rules. DSC supports various masking algorithms, including hashing, redaction, substitution, transformation, encryption, decryption, and shuffling. Each algorithm offers multiple configuration options, so you can choose the most suitable one for your business scenario.

On the Data Masking page, select the Masking Configuration > Masking Algorithms tab to view descriptions and configure each algorithm. For example, for hashing, DSC supports four rules: MD5, SHA1, SHA256, and HMAC. You can enter a salt value and click Test to verify the masking effect, then click Submit to save the configuration.

Efficient rule configuration

DSC also supports masking templates. To improve efficiency, you can group frequently used masking algorithms into a template and apply it when configuring data masking rules.

For more information, see Configure masking templates and algorithms.

Schedule data masking tasks

While the task in this tutorial is run manually, data masking tasks can also be scheduled to run at specific times, such as hourly, daily, weekly, or monthly. This ensures that updated data is promptly masked and ready for use.

In the How the task is triggered section, you can select Manual Only, Scheduled Only, or Manual + Scheduled. If you choose a scheduled option, you can set the trigger frequency in the Task Schedule Configuration section: for Hourly, set the minute; for Daily and Weekly, set the specific time and day; for Monthly, set the date and hour.