特定格式文本数据加工

更新时间:
复制为 MD 格式

非标准JSON对象转JSON对象并展开

需要对收集的dict数据进行二次嵌套展开操作。首先将dict数据转成JSON数据,再使用e_json函数进行展开即可。

  • 原始日志

    content: {
      'referer': '-',
      'request': 'GET /phpMyAdmin',
      'status': 404,
      'data-1': {
        'aaa': 'Mozilla',
        'bbb': 'asde'
      },
      'data-2': {
        'up_adde': '-',
        'up_host': '-'
      }
    }
  • SPL语句:将上述content内容中的单引号转换成双引号,转换成JSON格式数据。

    * | extend content_json = replace(content, '''', chr(34))
  • 处理结果

    content: {
      'referer': '-',
      'request': 'GET /phpMyAdmin',
      'status': 404,
      'data-1': {
        'aaa': 'Mozilla',
        'bbb': 'asde'
      },
      'data-2': {
        'up_adde': '-',
        'up_host': '-'
      }
    }
    content_json:  {
      "referer": "-",
      "request": "GET /phpMyAdmin",
      "status": 404,
      "data-1": {
        "aaa": "Mozilla",
        "bbb": "asde"
      },
      "data-2": {
        "up_adde": "-",
        "up_host": "-"
      }
    }
  • 对经过处理后的标准化的content_json数据第一层进行展开。

    * | parse-json content_json
  • 展开的日志为

    data-1:{"aaa":"Mozilla","bbb":"asde"}
    data-2:{"up_adde":"-","up_host":"-"}
    referer:-
    request:GET /phpMyAdmin
    status:404
  • 如果对data-1data-2继续进行展开

    * | parse-json content_json
      | parse-json "data-1"
      | parse-json "data-2"
  • 展开后日志为

    aaa:Mozilla
    bbb:asde
    referer:-
    request:GET /phpMyAdmin
    status:404
    up_adde:-
    up_host:-
  • 综上LOG SPL可以如以下形式:

    * | extend content_json = replace(content, '''', chr(34)) 
    | parse-json content_json
    | parse-json "data-1"
    | parse-json "data-2"
  • 处理结果为

    content:{'referer': '-', 'request': 'GET /phpMyAdmin', 'status': 404, 'data-1': {'aaa': 'Mozilla', 'bbb': 'asde'}, 'data-2': {'up_adde': '-', 'up_host': '-'}}
    content_json:{"referer": "-", "request": "GET /phpMyAdmin", "status": 404, "data-1": {"aaa": "Mozilla", "bbb": "asde"}, "data-2": {"up_adde": "-", "up_host": "-"}}
    data-1:{"aaa":"Mozilla","bbb":"asde"}
    data-2:{"up_adde":"-","up_host":"-"}
    aaa:Mozilla
    bbb:asde
    referer:-
    request:GET /phpMyAdmin
    status:404
    up_adde:-
    up_host:-

其他格式文本转JSON展开

对一些非标准的JSON格式数据,如果进行展开可以通过组合规则的形式进行操作。

  • 原始日志

    content : {
      "pod" => {
        "name" => "crm-learning-follow-7bc48f8b6b-m6kgb"
      }, "node" => {
        "name" => "tw5"
      }, "labels" => {
        "pod-template-hash" => "7bc48f8b6b", "app" => "crm-learning-follow"
      }, "container" => {
        "name" => "crm-learning-follow"
      }, "namespace" => "testing1"
    }
  • spl语句:首先将日志格式转换为JSON形式,可以使用replace函数进行转换,再使用parse-json函数展开。

    * | extend content_json = replace(content, '=>', ':')
    |parse-json content_json
  • 处理结果

    content:{
      "pod" => {
        "name" => "crm-learning-follow-7bc48f8b6b-m6kgb"
      }, "node" => {
        "name" => "tw5"
      }, "labels" => {
        "pod-template-hash" => "7bc48f8b6b", "app" => "crm-learning-follow"
      }, "container" => {
        "name" => "crm-learning-follow"
      }, "namespace" => "testing1"
    }
    content_json:{
      "pod" : {
        "name" : "crm-learning-follow-7bc48f8b6b-m6kgb"
      }, "node" : {
        "name" : "tw5"
      }, "labels" : {
        "pod-template-hash" : "7bc48f8b6b", "app" : "crm-learning-follow"
      }, "container" : {
        "name" : "crm-learning-follow"
      }, "namespace" : "testing1"
    }
    container:{"name":"crm-learning-follow"}
    labels:{"pod-template-hash":"7bc48f8b6b","app":"crm-learning-follow"}
    namespace:testing1
    node:{"name":"tw5"}
    pod:{"name":"crm-learning-follow-7bc48f8b6b-m6kgb"}

部分文本特殊编码转换

在日常工作环境中,会遇到一些十六进制字符,需要对其解码才能正常阅读。可以使用str_hex_escape_encode函数对一些十六进制字符进行转义操作。

  • 原始日志

    content : "\xe4\xbd\xa0\xe5\xa5\xbd"
  • spl语句

    * | extend decoded_content = ascii_unescape(content)
  • 处理结果

    content : "\xe4\xbd\xa0\xe5\xa5\xbd"
    hex_encode : "你好"