Mask sensitive data

更新时间:
复制 MD 格式

When you transform, ship, or use data, you can configure data masking rules to reduce the exposure of sensitive data. This way, you can mitigate the risk of data leaks in an efficient manner. This topic describes how to use functions to mask sensitive data in various scenarios.

Background information

You can mask sensitive data such as mobile phone numbers, bank card numbers, email addresses, IP addresses, AccessKeys, ID card numbers, URLs, order numbers, and strings. In Simple Log Service data transformation, common data masking methods include regular expression replacement (using the regex_replace function), Base64 transcoding (using the base64_encoding function), MD5 encoding (using the md5_encoding function), str_translate mapping (using the str_translate function), and GROK capture (using the grok function). For more information, see Regular expression functions, GROK functions, and Encoding and decoding functions.

Scenario 1: Mask mobile phone numbers

  • Method

    To mask the mobile phone numbers in log entries, you can use the regex_replace function.

  • Example

    • Raw log

      iphone: 13900001234
    • Transformation rule

      e_set(
          "sec_iphone",
          regex_replace(v("iphone"), r"(\d{0,3})\d{4}(\d{4})", replace=r"\1****\2"),
      )
    • Result

      iphone: 13900001234
      sec_iphone: 139****1234

Scenario 2: Mask bank card information

  • Method

    If a log contains bank card or credit card information, you can use the regex_replace function to mask it.

  • Example

    • Raw log

      content: bank number is 491648411333978312 and credit card number is 4916484113339780
    • Transformation rule

      e_set(
          "bank_number",
          regex_replace(
              v("content"), r"([1-9]{1})(\d{14}|\d{13}|\d{11})(\d{4})", replace=r"****\3"
          ),
      )
    • Result

      content: bank number is 491648411333978312 and credit card number is 4916484113339780 
      bank_number: bank number is ****8312 and credit card number is ****9780

Scenario 3: Mask email addresses

  • Method

    To mask the email addresses in log entries, you can use the regex_replace function.

  • Example

    • Raw log

      content: email is username@example.com
    • Transformation rule

      e_set(
          "email_encrypt",
          regex_replace(
              v("content"),
              r"[A-Za-z\d]+([-_.][A-Za-z\d]+)*(@([A-Za-z\d]+[-.])+[A-Za-z\d]{2,4})",
              replace=r"****\2",
          ),
      )
                                  
    • Result

      content: email is username@example.com
      email_encrypt: email is ****@example.com

Scenario 4: Mask AccessKey pairs

  • Method

    To mask the AccessKey pairs in log entries, you can use the regex_replace function.

  • Example

    • Raw log

      content: ak id is <testAccessKey ID> and ak key is <testAccessKey Secret>
    • Transformation rule

      e_set(
          "akid_encrypt",
          regex_replace(
              v("content"),
              r"([a-zA-Z0-9]{4})(([a-zA-Z0-9]{26})|([a-zA-Z0-9]{12}))",
              replace=r"\1****",
          ),
      )
    • Result

      content: ak id is <testAccessKey ID> and ak key is <testAccessKey Secret>
      akid_encrypt: ak id is rDhc**** and ak key is XQr1****

Scenario 5: Mask IP addresses

  • Method

    To extract and mask the IP addresses in log entries, you can use the regex_replace function and the grok function.

  • Example

    • Raw log

      content: ip is 192.0.2.10
    • Transformation rule

      e_set("ip_encrypt",regex_replace(v('content'), grok('(%{IP})'), replace=r"****"))
    • Result

      content: ip is 192.0.2.10
      ip_encrypt: ip is ****

Scenario 6: Mask ID card numbers

  • Method

    To mask the ID card numbers in log entries, you can use the regex_replace function and the grok function.

  • Example

    • Raw log

      content: Id card is 111222190002309999
    • Transformation rule

      e_set(
          "id_encrypt", regex_replace(v("content"), grok("(%{CHINAID})"), replace=r"\1****")
      )
    • Result

      content: Id card is 111222190002309999
      id_encrypt: Id card is 111222****

Scenario 7: Mask URLs

  • Method

    To mask the URLs in log entries, you can convert the URLs to plaintext and then use the Base64 encoding and decoding functions to transcode the URLs.

  • Example

    • Raw log

      url: https://www.aliyun.com/sls?logstore
    • Transformation rule

      e_set("base64_url",base64_encoding(v("url")))
    • Result

      url: https://www.aliyun.com/sls?logstore
      base64_url: aHR0cHM6Ly93d3cuYWxpeXVuLmNvbS9zbHM/bG9nc3RvcmU=
      Note

      To decode the value of the base64_url field, use the base64_decoding(v("base64_url")) Domain-Specific Language (DSL) syntax.

Scenario 8: Mask order numbers

  • Method

    To mask the order numbers in log entries and prevent other users from decoding the order numbers, you can use the MD5 encoding function to encode the order numbers.

  • Example

    • Raw log

      orderId: 20210101123456
    • Transformation rule

      e_set("md5_orderId",md5_encoding(v("orderId")))
    • Result

      orderId: 20210101123456
      md5_orderId: 9c0ab8e4d9f4eb6fbd5c508bbca05951

Scenario 9: Mask strings

  • Method

    To prevent specific strings in a log from being exposed, you can use the str_translate function to define a mapping rule that masks specific characters or strings.

  • Example

    • Raw log

      data: message level is info_
    • Transformation rule

      e_set("data_translate", str_translate(v("data"),"aeiou","12345"))
    • Result

      data: message level is info
      data_translate: m2ss1g2 l2v2l 3s 3nf4