Nginx访问日志记录了用户访问的详细信息,解析Nginx访问日志对业务运维具有重要意义。本文介绍如何使用正则表达式函数解析Nginx访问日志。
解析标准Nginx日志
日志服务支持通过SPL的正则表达式解析Nginx日志。现以一条Nginx成功访问日志为例,介绍如何使用正则表达式解析Nginx成功访问日志。
原始日志
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
解析需求
需求1:从Nginx日志中提取出
code
、ip
、datetime
、protocol
、request
、sendbytes
、referer
、useragent
、verb
信息。需求2:对
request
进行再提取,提取出uri_proto
、uri_domain
、uri_param
信息。需求3:对解析出来的
uri_param
进行再提取,提取出uri_path
、uri_query
信息。
SLS SPL编排
总编排
* | parse-regexp content, '(\d+\.\d+\.\d+\.\d+) - - \[([\s\S]+)\] \"([A-Z]+) ([\S]*) ([\S]+)["] (\d+) (\d+) ["]([\S]*)["] ["]([\S\s]+)["]' as ip, datetime,verb,request,protocol,code,sendbytes,refere,useragent | parse-regexp request, '^(\w+):\/\/([^\/]+)(\/.*)$' as uri_proto, uri_domain, uri_param | parse-regexp uri_param, '([^?]*)\?(.*)' as uri_path, uri_query
细分编排及对应加工结果
针对需求1解析Nginx日志的加工编排如下。
* | parse-regexp content, '(\d+\.\d+\.\d+\.\d+) - - \[([\s\S]+)\] \"([A-Z]+) ([\S]*) ([\S]+)["] (\d+) (\d+) ["]([\S]*)["] ["]([\S\s]+)["]' as ip, datetime,verb,request,protocol,code,sendbytes,refere,useragent
对应结果:
__source__: 192.168.0.1 __tag__: __receive_time__: 1563443076 code: 200 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1 datetime: 04/Jan/2019:16:06:38 +0800 ip: 192.168.0.2 protocol: HTTP/1.1 refere: - request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes: 273932 useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb: GET
针对需求2解析request,SPL编排如下。
* | parse-regexp request, '^(\w+):\/\/([^\/]+)(\/.*)$' as uri_proto, uri_domain, uri_param
对应结果:
uri_param: /_astats?application=&inf.name=eth0 uri_domain: example.aliyundoc.com uri_proto: http
针对需求3解析uri_param,SPL编排如下。
* | parse-regexp uri_param, '([^?]*)\?(.*)' as uri_path, uri_query
对应结果:
uri_path: /_astats uri_query: application=&inf.name=eth0
SPL最终处理结果
__source__: 192.168.0.1 __tag__: __receive_time__: 1563443076 code: 200 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1 datetime: 04/Jan/2019:16:06:38 +0800 ip: 192.168.0.2 protocol: HTTP/1.1 refere: - request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes: 273932 uri_domain: example.aliyundoc.com uri_proto: http uri_param: /_astats?application=&inf.name=eth0 uri_path: /_astats uri_query: application=&inf.name=eth0 useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb: GET
解析非标准Nginx日志
场景一:提取日志中间的关键字
根据正则表达式,从message中提取出日志的中间的Time,Level,Server,Info值,使用parse-regexp进行编排。
示例
原始日志
{"message": "[2024-10-11 10:30:34.917962]\t[info]\t[SingleWorldService]\t[ResourceManager:testOut for 2, srvClusterId=1009]\t[[] ...ewEntities/ResourceServiceComponent/ResourceManager.out:190]"}
SPL编排
*| parse-regexp message, '\[([^[\]]+)\]\s+\[([^[\]]+)\]\s+\[([^[\]]+)\]\s+\[([^[\]]+)\]' as Time,Level,Server,Info
加工结果
Time:2024-10-11 10:30:34.917962 Level:info Server:SingleWorldService Info:ResourceManager:testOut for 2, srvClusterId=1009 message:[2024-10-11 10:30:34.917962] [info] [SingleWorldService] [ResourceManager:testOut for 2, srvClusterId=1009] [[] ...ewEntities/ResourceServiceComponent/ResourceManager.out:190]
场景二:从日志中根据正则解析特定值
根据正则表达式,从content中提取出RequestTime,traceId,ThreadName,LogLevel,ClassName,LineNum,LogInfo字段值,使用parse-regexp进行编排。
示例
原始日志
{"content":"2023-11-11 14:47:17.844 [12] [backup-test-thread] INFO com.shidsds.dus.service.BackTestService 109 | 备份缓存 1021 秒前已刷新,本次跳过:backupCache:com.shidsds.dus.service.DuuewwService:lastRefreshTime"}
SPL编排
*| parse-regexp content, '([\d\-]{10}\s+[\d:\.]{12})\s+\[([^[\]]+)\]\s+\[([^[\]]+)\]\s+([\S]+)\s+([\S]+)\s+([\d]+)\s+\|\s+(.*)' as RequestTime,traceId,ThreadName,LogLevel,ClassName,LineNum,LogInfo
加工结果
ClassName:com.shidsds.dus.service.BackTestService LineNum:109 LogInfo:备份缓存 1021 秒前已刷新,本次跳过:backupCache:com.shidsds.dus.service.DuuewwService:lastRefreshTime LogLevel:INFO RequestTime:2023-11-11 14:47:17.844 ThreadName:backup-test-thread content:2023-11-11 14:47:17.844 [] [backup-test-thread] INFO com.shidsds.dus.service.BackTestService 109 | 备份缓存 1021 秒前已刷新,本次跳过:backupCache:com.shidsds.dus.service.DuuewwService:lastRefreshTime traceId:12
该文章对您有帮助吗?