使用SPL采集文本日志

本文主要介绍如何使用SPL实现类似于处理插件采集日志的功能。

背景信息

SPL与原生插件对比

正则解析对比

日志样例:

127.0.0.1 - - [07/Jul/2022:10:43:30 +0800] "POST /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java"

正则解析

SPL

正则解析的正则表达式为:([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\",日志提取字段为:ip,time,method,url,request_time,request_length,status,length,ref_url,browser。具体操作请参见使用正则模式采集文本日志

image

SPL语句为:* | parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | project-away content。此语句中project-away舍弃原有字段content,parse-regexp提取字段。

输出结果预览

{
    "ip": "127.0.0.1",
    "time": "07/Jul/2022:10:43:30",
    "method": "POST",
    "url": "/PutData?Category=YunOsAccountOpLog",
    "request_time": "0.024",
    "request_length": "18204",
    "status": "200",
    "length": "37",
    "ref_url": "-",
    "browser": "aliyun-sdk-java",
    "__time__": "1713184059"
}

分隔符解析对比

日志样例:

127.0.0.1,07/Jul/2022:10:43:30 +0800,POST,PutData Category=YunOsAccountOpLog,0.024,18204,200,37,-,aliyun-sdk-java

分隔符解析

SPL

分隔符解析使用半角逗号(,)分割,处理插件选择分隔符解析,具体操作请参见采集主机文本日志

image

SPL语句为:*| parse-csv content as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | project-away content。此语句中project-away舍弃原有字段content,parse-csv提取字段。

输出结果预览

{
    "ip": "127.0.0.1",
    "time": "07/Jul/2022:10:43:30 +0800",
    "method": "POST",
    "url": "PutData?Category=YunOsAccountOpLog",
    "request_time": "0.024",
    "request_length": "18204",
    "status": "200",
    "length": "37",
    "ref_url": "-",
    "browser": "aliyun-sdk-java",
    "__time__": "1713231487"
}

JSON解析对比

日志样例:

{"url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1","ip": "10.200.98.220", "user-agent": "aliyun-sdk-java","request": "{\"status\":\"200\",\"latency\":\"18204\"}","time": "07/Jul/2022:10:30:28"}

JSON解析

SPL

具体操作请参见采集JSON格式文本日志

image

SPL语句为:*| parse-json content| project-away content。此语句中project-away舍弃原有字段content,parse-json提取字段。

输出结果预览

{
    "url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1",
    "ip": "10.200.98.220",
    "user-agent": "aliyun-sdk-java",
    "request": "{"status":"200","latency":"18204"}",
    "time": "07/Jul/2022:10:30:28"
}

正则解析+时间解析对比

日志样例:

127.0.0.1 - - [2024-11-05T15:47:05 +0800] "POST /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java"

正则解析+时间解析

SPL

  • 正则解析的正则表达式为:([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\",日志提取字段为:ip,time,method,url,request_time,request_length,status,length,ref_url,browser。具体操作请参见使用正则模式采集文本日志

    image

  • 时间解析插件原始字段为time,时间格式为%Y-%m-%dT%H:%M:%S

    image

SPL语句为:* | parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+)\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser| extend ts=date_parse(time, '%Y-%m-%dT%H:%i:%S')| extend __time__=cast(to_unixtime(ts) as INTEGER)-28800| project-away ts| project-away content。此语句中project-away舍弃原有字段content,parse-regexp提取字段,date_parse解析日志中的时间。

正则解析+过滤处理对比

日志样例:

127.0.0.1 - - [2024-11-05T15:47:05 +0800] "POST /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java"

正则解析+过滤解析

SPL

  • 正则解析的正则表达式为:([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\",日志提取字段为:ip,time,method,url,request_time,request_length,status,length,ref_url,browser。具体操作请参见使用正则模式采集文本日志

    image

  • 过滤处理添加白名单statusmethod

    image

SPL语句为:*| parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\"]*)\" \"([^\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser| project-away content| where regexp_like(method, '^(POST|PUT)$') and regexp_like(status, '^200$')。此语句中project-away舍弃原有字段content,parse-regexp提取字段,regexp_like函数匹配符合正则的数据。

输出结果预览

{
    "ip": "127.0.0.1",
    "time": "2024-11-05T15:47:05",
    "method": "POST",
    "url": "/PutData?Category=YunOsAccountOpLog",
    "request_time": "0.024",
    "request_length": "18204",
    "status": "200",
    "length": "37",
    "ref_url": "-",
    "browser": "aliyun-sdk-java",
    "__time__": "1713238839"
}

脱敏处理对比

日志样例:

{"account":"1812213231432969","password":"04a23f38"}

脱敏处理

SPL

脱敏处理password字段脱敏。

image

SPL语句为:*| parse-regexp content, 'password":"(\S+)"' as password| extend content=replace(content, password, '******')。此语句中project-away舍弃原有字段content,parse-regexp提取字段,replace替换成脱敏数据。

输出结果预览

{
    "content": "{"account":"1812213231432969","password":"******"}"
}

SPL与扩展插件对比

添加字段对比

日志样例:

this is a test log

添加字段

SPL

日志默认存在content字段中。添加字段添加字段service:A

image

SPL语句为:* | extend service='A'。此语句中extend添加字段service:A

输出结果预览

{
    "content": "this is a test log",
    "service": "A"
}

Json解析+丢弃字段对比

日志样例:

{"key1": 123456, "key2": "abcd"}

Json解析+丢弃字段

SPL

SPL语句为:*| parse-json content| project-away content| project-away key1。此语句中project-away舍弃原有字段contentkey1parse-json提取字段。

输出结果预览

{
    "key2": "abcd"
}

Json解析+重命名字段对比

日志样例:

{"key1": 123456, "key2": "abcd"}

Json解析+重命名字段

SPL

SPL语句为:*| parse-json content| project-away content| project-rename new_key1=key1。此语句中project-away舍弃原有字段contentparse-json提取字段,project-rename重命名字段key1new_key1

输出结果预览

{
    "new_key1": "123456",
    "key2": "abcd"
}

Json解析+过滤字段对比

日志样例:

{"ip": "10.**.**.**", "method": "POST", "browser": "aliyun-sdk-java"}
{"ip": "10.**.**.**", "method": "POST", "browser": "chrome"}
{"ip": "192.168.**.**", "method": "POST", "browser": "aliyun-sls-ilogtail"}

Json解析+过滤字段

SPL

SPL语句为:*| parse-json content| project-away content| where regexp_like(ip, '10\..*') and regexp_like(method, 'POST') and not regexp_like(browser, 'aliyun.*')。此语句中project-away舍弃原有字段contentparse-json提取字段,regexp_like判断符合条件。

输出结果预览

{
    "ip": "10.**.**.**",
    "method": "POST",
    "browser": "chrome"
}

Json解析+字段值映射处理对比

日志样例:

{"_ip_":"192.168.*.*","Index":"900000003"}
{"_ip_":"255.255.**.**","Index":"3"}

Json解析+字段值映射处理

SPL

SPL语句为:*| parse-json content| project-away content| extend _processed_ip_= CASE WHEN _ip_ = '127.0.*.*' THEN 'LocalHost-LocalHost' WHEN _ip_ = '192.168.*.*' THEN 'default login' ELSE 'Not Detected' END。此语句中project-away舍弃原有字段contentparse-json提取字段,extend构建新字段。

输出结果预览

{
    "_ip_": "192.168.*.*",
    "Index": "900000003",
    "_processed_ip_": "default login"
}

字符串替换对比

日志样例:

hello,how old are you? nice to meet you

字符串替换

SPL

字符串替换用空值代替how old are you?

image

SPL语句为:*| extend content=replace(content, 'how old are you?', '')。此语句中extend用空值代替how old are you?

输出结果预览

{
    "content": "hello, nice to meet you"
}

数据编码对比

日志样例:

this is a test log

BASE64编码

BASE64编码

SPL

具体操作请参见BASE64编码

image

SPL语句为:*| extend content1=to_base64(cast(content as varbinary))。此语句中extend添加字段content1to_base64函数对数据进行BASE64编码。

输出结果预览
{
    "content": "this is a test log",
    "content1": "dGhpcyBpcyBhIHRlc3QgbG9n"
}

MD5编码

MD5

SPL

具体操作请参见MD5编码

image

SPL语句为:*| extend test=lower(to_hex(md5(cast(content as varbinary))))。此语句中extend添加字段testmd5函数对数据进行MD5编码。

输出结果预览
{
    "content": "this is a test log",
    "content1": "4f3c93e010f366eca78e00dc1ed08984"
}

新增能力项

数学计算

  • 日志样例

    4
  • SPL语句

    cast函数转换数据类型,power函数、round函数和sqrt函数请参见数学计算函数

    *
    | extend val = cast(content as double)
    | extend power_test = power(val, 2)
    | extend round_test = round(val)
    | extend sqrt_test = sqrt(val)
  • 输出结果预览

    {
        "content": "4",
        "power_test": 16.0,
        "round_test": 4.0,
        "sqrt_test": 2.0,
        "val": 4.0
    }

URL 计算

URL 编码解码

  • 日志样例

    https://homenew.console.aliyun.com/home/dashboard/ProductAndService
  • SPL语句

    url_encode函数和url_decode函数请参见URL函数

    *
    | extend encoded = url_encode(content)
    | extend decoded = url_decode(encoded)
  • 输出结果预览

    {
        "content": "https://homenew.console.aliyun.com/home/dashboard/ProductAndService",
        "decoded": "https://homenew.console.aliyun.com/home/dashboard/ProductAndService",
        "encoded": "https%3A%2F%2Fhomenew.console.aliyun.com%2Fhome%2Fdashboard%2FProductAndService"
    }

URL 提取

  • 日志样例

    https://sls.console.aliyun.com:443/lognext/project/dashboard-all/logsearch/nginx-demo?accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw
  • SPL语句

    SPL语句中涉及函数请参见URL函数

    *
    | extend host = url_extract_host(content)
    | extend query = url_extract_query(content)
    | extend path = url_extract_path(content) 
    | extend protocol = url_extract_protocol(content) 
    | extend port = url_extract_port(content) 
    | extend param = url_extract_parameter(content, 'accounttraceid')
  • 输出结果预览

    {
        "content": "https://sls.console.aliyun.com:443/lognext/project/dashboard-all/logsearch/nginx-demo?accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw",
        "host": "sls.console.aliyun.com",
        "param": "d6241a173f88471c91d3405cda010ff5ghdw",
        "path": "/lognext/project/dashboard-all/logsearch/nginx-demo",
        "port": "443",
        "protocol": "https",
        "query": "accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw"
    }

比较&逻辑运算符

  • 日志样例

    {"num1": 199, "num2": 10, "num3": 9}
  • SPL语句

    cast函数转换数据类型,parse-json提取字段。

    *
    | parse-json content
    | extend compare_result = cast(num1 as double) > cast(num2 as double) AND cast(num2 as double) > cast(num3 as double
  • 输出结果预览

    {
        "compare_result": "true",
        "content": "{"num1": 199, "num2": 10, "num3": 9}",
        "num1": "199",
        "num2": "10",
        "num3": "9"
    }