本文为您提供日志数据分析的一些案例。
5分钟错误率超过40%时触发报警
统计每分钟的500错误率,当最近5分钟错误率超过40%时触发报警。
status:500 | select __topic__, max_by(error_count,window_time)/1.0/sum(error_count) as error_ratio, sum(error_count) as total_error from (
select __topic__, count(*) as error_count , __time__ - __time__ % 300 as window_time from log group by __topic__, window_time
)
group by __topic__ having max_by(error_count,window_time)/1.0/sum(error_count) > 0.4 and sum(error_count) > 500 order by total_error desc limit 100
统计流量并设置告警
统计每分钟的流量,当最近的流量出现暴跌时,触发报警。 由于在最近的一分钟内,统计的数据不是一个完整分钟的,所以需要除以greatest(max(__time__) - min(__time__),1)
进行归一化,统计每个分钟内的流量均值。
* | SELECT SUM(inflow) / greatest(max(__time__) - min(__time__),1) as inflow_per_minute, date_trunc('minute',__time__) as minute group by minute
计算不同数据区间的平均延时
按照数据区间分桶,在每个桶内计算平均延时。
* | select avg(latency) as latency , case when originSize < 5000 then 's1' when originSize < 20000 then 's2' when originSize < 500000 then 's3' when originSize < 100000000 then 's4' else 's5' end as os group by os
返回不同结果的百分比
返回不同部门的count结果,及其所占百分比。该query结合了子查询、窗口函数。其中sum(c) over()
表示计算所有行的和。
* | select department, c*1.0/ sum(c) over () from(select count(1) as c, department from log group by department)
统计满足条件的个数
在URL路径中,我们需要根据URL不同的特征来计数,这种情况可以使用CASE WHEN语法,但还有个更简单的语法是count_if。
* | select count_if(uri like '%login') as login_num, count_if(uri like '%register') as register_num, date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as time group by time order by time limit 100
通用聚合分析
客户端PV全球分布
通过ip_to_country函数获取IP所在国家,然后根据ip_country聚合,计算每个国家的客户端数量, 您可以通过配置世界地图可视化展示数据。
* |
select
ip_to_country(client_ip) as ip_country,
count(*) as pv
group by
ip_country
order by
pv desc
limit
500
请求方法分类PV趋势
时间按照分钟对齐,然后与method一起分组聚合计算pv,按照时间进行排序,您可以使用线图,配置x轴为t,y轴为pv,聚合列为request_method,可视化展示数据。
* |
select
date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as t,
request_method,
count(*) as pv
group by
t,
request_method
order by
t asc
limit
10000
user_agent请求PV分布
通过http_user_agent分组聚合,然后查询出各个代理的请求以及返回客户端流量的总和。由于单位是byte,运算转为MB并保留两位小数,再使用case when
为status分层,分为2xx、3xx、4xx、5xx以及各层所占的比例。您可以使用基础表格可视化展示数据。
* |
select
http_user_agent as "用户代理",
count(*) as pv,
round(sum(request_length) / 1024.0 / 1024, 2) as "请求报文流量(MB)",
round(sum(body_bytes_sent) / 1024.0 / 1024, 2) as "返回客户端流量(MB)",
round(
sum(
case
when status >= 200
and status < 300 then 1
else 0
end
) * 100.0 / count(1),
6
) as "2xx比例(%)",
round(
sum(
case
when status >= 300
and status < 400 then 1
else 0
end
) * 100.0 / count(1),
6
) as "3xx比例(%)",
round(
sum(
case
when status >= 400
and status < 500 then 1
else 0
end
) * 100.0 / count(1),
6
) as "4xx比例(%)",
round(
sum(
case
when status >= 500
and status < 600 then 1
else 0
end
) * 100.0 / count(1),
6
) as "5xx比例(%)"
group by
"用户代理"
order by
pv desc
limit
100
本月每日消费及趋势预测
首先根据账单号对数据做分组,获取到对应的时间、产品标识、账单类型及应付金额。接着按天做聚合计算每天的费用。然后按照时间排序找到产生费用的最新一天。最后将时间、费用、传递的字符串(可以作为之后的别名)、预测周期、预测点的个数作为sls_inner_ts_regression的参数,获取当前每天的消费情况及之后的预测结果。 您可以使用线图,x轴配置时间、y轴配置实际消费和预测消费两个字段,可视化展示本月的消费情况以及剩余时间的消费预测。
source :bill |
select
date_format(res.stamp, '%Y-%m-%d') as time,
res.real as "实际消费",case
when is_nan(res.real) then res.pred
else null
end as "预测消费",
res.instances
from(
select
sls_inner_ts_regression(
cast(day as bigint),
total,
array ['total'],
86400,
60
) as res
from
(
select
*
from
(
select
*,
max(day) over() as lastday
from
(
select
to_unixtime(date_trunc('day', __time__)) as day,
sum(PretaxAmount) as total
from
(
select
RecordID,
arbitrary(__time__) as __time__,
arbitrary(ProductCode) as ProductCode,
arbitrary(item) as item,
arbitrary(PretaxAmount) as PretaxAmount
from
log
group by
RecordID
)
group by
day
order by
day
)
)
where
day < lastday
)
)
limit
1000
本月消费情况各产品的占比
查询语句中首先根据产品名称聚合计算各个产品的总支出,使用row_number函数将产品按照支出金额倒序排列。找出支出最多的六个产品,其他的归为其他。您可以通过配置环形图可视化展示数据。
source :bill |
select
case
when rnk > 6
or pretaxamount <= 0 then '其他'
else ProductName
end as ProductName,
sum(PretaxAmount) as PretaxAmount
from(
select
*,
row_number() over(
order by
pretaxamount desc
) as rnk
from(
select
ProductName,
sum(PretaxAmount) as PretaxAmount
from
log
group by
ProductName
)
)
group by
ProductName
order by
PretaxAmount desc
limit
1000
昨天的消费及与上月的同比
先计算昨天的总费用,使用coalesce表达式当未产生费用时返回0,round函数取三位小数。然后通过compare函数计算同比。 您可以使用统计图中的趋势图,展示昨天的消费总额及与上个月的同比情况。
source :bill |
select
round(diff [1], 2),
round(diff [2], 2),
round(diff [3] * 100 -100, 2)
from(
select
compare("昨日消费", 604800) as diff
from(
select
round(coalesce(sum(PretaxAmount), 0), 3) as "昨日消费"
from
log
)
)
Tomcat Web服务分析
tomcat请求状态趋势分析
通过date_trunc函数对日志时间按照分钟对齐,使用date_format函数提取出小时、分钟,将提取后的时间与访问状态码status通过group by聚合,获取每分钟每个状态码的count值。您可以使用流图展示数据,配置x轴为time,y轴为count,聚合列为status。
* |
select
date_format(date_trunc('minute', __time__), '%H:%i') as time,
COUNT(1) as c,
status
GROUP by
time,
status
ORDER by
time
LIMIT
1000
tomcat访问的pv、uv时间分布
使用time_series函数根据日志时间做两分钟对齐,然后对时间聚合计算出访问数量,使用approx_distinct函数计算remote_addr去重之后的数量,然后根据时间排序。您可以通过添加多Y轴线图,配置x轴为time、y轴两条线分别展示uv和pv值,可视化展示数据。
* |
select
time_series(__time__, '2m', '%H:%i', '0') as time,
COUNT(1) as pv,
approx_distinct(remote_addr) as uv
GROUP by
time
ORDER by
time
LIMIT
1000
tomcat错误请求数量以及与上一小时错误请求比较
内层SQL获取状态大于等于400的请求数,中间层SQL使用compare函数获取3600秒前的数据,外层中c1是当前时间的错误数量,c2是3600秒前的错误数量,c3是c1/c2的比值,用于展示趋势,您可以配置环形图,设置c1为显示值,c3为对比值,可视化展示数据。
status >= 400 |
SELECT
diff [1] AS c1,
diff [2] AS c2,
round(diff [1] * 100.0 / diff [2] - 100.0, 2) AS c3
FROM
(
select
compare(c, 3600) AS diff
from
(
select
count(1) as c
from
log
)
)
tomcat中请求数前十的URL展示
通过request_uri分组聚合,计算每个request_uri对应的访问量,然后按照访问量倒序排列,取前十条数据,您可以配置基础计量图,设置x轴为page,y轴为pv,可视化展示数据。
* |
SELECT
request_uri as page,
COUNT(*) as pv
GROUP by
page
ORDER by
pv DESC
LIMIT
10
tomcat的客户端分类及数量分布
通过user_agent字段聚合分析,计算出每种客户端的数量。您可以配置环形图,设置分类为user_agent,数值列为c,可视化展示数据。
* |
SELECT
user_agent,
COUNT(*) AS c
GROUP BY
user_agent
ORDER BY
c DESC
tomcat流出流量统计
使用time_series函数将数据时间对齐,然后聚合时间计算body_bytes_sent的和。您可以配置线图x轴和y轴,设置x轴为time,y轴为body_sent,可视化展示数据。
* |
select
time_series(__time__, '10s', '%H:%i:%S', '0') as time,
sum(body_bytes_sent) as body_sent
GROUP by
time
ORDER by
time
LIMIT
1000
tomcat错误请求占比
在SQL内部获取请求status超过400的错误请求数量以及总的请求数量。在外部计算比值。您可以配置刻度盘,可视化展示数据。
* |
select
round((errorCount * 100.0 / totalCount), 2) as errorRatio
from
(
select
sum(
case
when status >= 400 then 1
else 0
end
) as errorCount,
count(1) as totalCount
from
log
)