常见分析案例

本文为您提供日志数据分析的一些案例。

5分钟错误率超过40%时触发报警

统计每分钟的500错误率,当最近5分钟错误率超过40%时触发报警。

status:500 | select __topic__, max_by(error_count,window_time)/1.0/sum(error_count) as error_ratio, sum(error_count) as total_error  from (
select __topic__, count(*) as error_count , __time__ - __time__ % 300  as window_time  from log group by __topic__, window_time
) 
group by __topic__  having  max_by(error_count,window_time)/1.0/sum(error_count)   > 0.4  and sum(error_count) > 500 order by total_error desc limit 100

统计流量并设置告警

统计每分钟的流量,当最近的流量出现暴跌时,触发报警。 由于在最近的一分钟内,统计的数据不是一个完整分钟的,所以需要除以greatest(max(__time__) - min(__time__),1)进行归一化,统计每个分钟内的流量均值。

* | SELECT SUM(inflow) / greatest(max(__time__) - min(__time__),1) as inflow_per_minute, date_trunc('minute',__time__)  as minute group by minute

计算不同数据区间的平均延时

按照数据区间分桶,在每个桶内计算平均延时。

* | select avg(latency) as latency , case when originSize < 5000 then 's1' when originSize < 20000 then 's2' when originSize < 500000 then 's3' when originSize < 100000000 then 's4' else 's5' end as os group by os

返回不同结果的百分比

返回不同部门的count结果,及其所占百分比。该query结合了子查询、窗口函数。其中sum(c) over()表示计算所有行的和。

* |  select   department, c*1.0/ sum(c) over ()  from(select  count(1) as c, department   from log group by department)

统计满足条件的个数

URL路径中,我们需要根据URL不同的特征来计数,这种情况可以使用CASE WHEN语法,但还有个更简单的语法是count_if。

* | select  count_if(uri like '%login') as login_num, count_if(uri like '%register') as register_num, date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as time  group by time order by time limit 100

通用聚合分析

客户端PV全球分布

通过ip_to_country函数获取IP所在国家,然后根据ip_country聚合,计算每个国家的客户端数量, 您可以通过配置世界地图可视化展示数据。

* |
select
  ip_to_country(client_ip) as ip_country,
  count(*) as pv
group by
  ip_country
order by
  pv desc
limit
  500

请求方法分类PV趋势

时间按照分钟对齐,然后与method一起分组聚合计算pv,按照时间进行排序,您可以使用线图,配置x轴为t,y轴为pv,聚合列为request_method,可视化展示数据。

* |
select
  date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as t,
  request_method,
  count(*) as pv
group by
  t,
  request_method
order by
  t asc
limit
  10000

user_agent请求PV分布

通过http_user_agent分组聚合,然后查询出各个代理的请求以及返回客户端流量的总和。由于单位是byte,运算转为MB并保留两位小数,再使用case whenstatus分层,分为2xx、3xx、4xx、5xx以及各层所占的比例。您可以使用基础表格可视化展示数据。

* |
select
  http_user_agent as "用户代理",
  count(*) as pv,
  round(sum(request_length) / 1024.0 / 1024, 2) as "请求报文流量(MB)",
  round(sum(body_bytes_sent) / 1024.0 / 1024, 2) as "返回客户端流量(MB)",
  round(
    sum(
      case
        when status >= 200
        and status < 300 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "2xx比例(%)",
  round(
    sum(
      case
        when status >= 300
        and status < 400 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "3xx比例(%)",
  round(
    sum(
      case
        when status >= 400
        and status < 500 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "4xx比例(%)",
  round(
    sum(
      case
        when status >= 500
        and status < 600 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "5xx比例(%)"
group by
  "用户代理"
order by
  pv desc
limit
  100

本月每日消费及趋势预测

首先根据账单号对数据做分组,获取到对应的时间、产品标识、账单类型及应付金额。接着按天做聚合计算每天的费用。然后按照时间排序找到产生费用的最新一天。最后将时间、费用、传递的字符串(可以作为之后的别名)、预测周期、预测点的个数作为sls_inner_ts_regression的参数,获取当前每天的消费情况及之后的预测结果。 您可以使用线图,x轴配置时间、y轴配置实际消费和预测消费两个字段,可视化展示本月的消费情况以及剩余时间的消费预测。

source :bill |
select
  date_format(res.stamp, '%Y-%m-%d') as time,
  res.real as "实际消费",case
    when is_nan(res.real) then res.pred
    else null
  end as "预测消费",
  res.instances
from(
    select
      sls_inner_ts_regression(
        cast(day as bigint),
        total,
        array ['total'],
        86400,
        60
      ) as res
    from
      (
        select
          *
        from
          (
            select
              *,
              max(day) over() as lastday
            from
              (
                select
                  to_unixtime(date_trunc('day', __time__)) as day,
                  sum(PretaxAmount) as total
                from
                  (
                    select
                      RecordID,
                      arbitrary(__time__) as __time__,
                      arbitrary(ProductCode) as ProductCode,
                      arbitrary(item) as item,
                      arbitrary(PretaxAmount) as PretaxAmount
                    from
                      log
                    group by
                      RecordID
                  )
                group by
                  day
                order by
                  day
              )
          )
        where
          day < lastday
      )
  )
limit
  1000

本月消费情况各产品的占比

查询语句中首先根据产品名称聚合计算各个产品的总支出,使用row_number函数将产品按照支出金额倒序排列。找出支出最多的六个产品,其他的归为其他。您可以通过配置环形图可视化展示数据。

source :bill |
select
  case
    when rnk > 6
    or pretaxamount <= 0 then '其他'
    else ProductName
  end as ProductName,
  sum(PretaxAmount) as PretaxAmount
from(
    select
      *,
      row_number() over(
        order by
          pretaxamount desc
      ) as rnk
    from(
        select
          ProductName,
          sum(PretaxAmount) as PretaxAmount
        from
          log
        group by
          ProductName
      )
  )
group by
  ProductName
order by
  PretaxAmount desc
limit
  1000

昨天的消费及与上月的同比

先计算昨天的总费用,使用coalesce表达式当未产生费用时返回0,round函数取三位小数。然后通过compare函数计算同比。 您可以使用统计图中的趋势图,展示昨天的消费总额及与上个月的同比情况。

source :bill |
select
  round(diff [1], 2),
  round(diff [2], 2),
  round(diff [3] * 100 -100, 2)
from(
    select
      compare("昨日消费", 604800) as diff
    from(
        select
          round(coalesce(sum(PretaxAmount), 0), 3) as "昨日消费"
        from
          log
      )
  )

Tomcat Web服务分析

tomcat请求状态趋势分析

通过date_trunc函数对日志时间按照分钟对齐,使用date_format函数提取出小时、分钟,将提取后的时间与访问状态码status通过group by聚合,获取每分钟每个状态码的count值。您可以使用流图展示数据,配置x轴为time,y轴为count,聚合列为status。

* |
select
  date_format(date_trunc('minute', __time__), '%H:%i') as time,
  COUNT(1) as c,
  status
GROUP by
  time,
  status
ORDER by
  time
LIMIT
  1000

tomcat访问的pv、uv时间分布

使用time_series函数根据日志时间做两分钟对齐,然后对时间聚合计算出访问数量,使用approx_distinct函数计算remote_addr去重之后的数量,然后根据时间排序。您可以通过添加多Y轴线图,配置x轴为time、y轴两条线分别展示uvpv值,可视化展示数据。

* |
select
  time_series(__time__, '2m', '%H:%i', '0') as time,
  COUNT(1) as pv,
  approx_distinct(remote_addr) as uv
GROUP by
  time
ORDER by
  time
LIMIT
  1000

tomcat错误请求数量以及与上一小时错误请求比较

内层SQL获取状态大于等于400的请求数,中间层SQL使用compare函数获取3600秒前的数据,外层中c1是当前时间的错误数量,c23600秒前的错误数量,c3c1/c2的比值,用于展示趋势,您可以配置环形图,设置c1为显示值,c3为对比值,可视化展示数据。

status >= 400 |
SELECT
  diff [1] AS c1,
  diff [2] AS c2,
  round(diff [1] * 100.0 / diff [2] - 100.0, 2) AS c3
FROM
  (
    select
      compare(c, 3600) AS diff
    from
      (
        select
          count(1) as c
        from
          log
      )
  )

tomcat中请求数前十的URL展示

通过request_uri分组聚合,计算每个request_uri对应的访问量,然后按照访问量倒序排列,取前十条数据,您可以配置基础计量图,设置x轴为page,y轴为pv,可视化展示数据。

* |
SELECT
  request_uri as page,
  COUNT(*) as pv
GROUP by
  page
ORDER by
  pv DESC
LIMIT
  10

tomcat的客户端分类及数量分布

通过user_agent字段聚合分析,计算出每种客户端的数量。您可以配置环形图,设置分类为user_agent,数值列为c,可视化展示数据。

* |
SELECT
  user_agent,
  COUNT(*) AS c
GROUP BY
  user_agent
ORDER BY
  c DESC

tomcat流出流量统计

使用time_series函数将数据时间对齐,然后聚合时间计算body_bytes_sent的和。您可以配置线图x轴和y,设置x轴为time,y轴为body_sent,可视化展示数据。

* |
select
  time_series(__time__, '10s', '%H:%i:%S', '0') as time,
  sum(body_bytes_sent) as body_sent
GROUP by
  time
ORDER by
  time
LIMIT
  1000

tomcat错误请求占比

SQL内部获取请求status超过400的错误请求数量以及总的请求数量。在外部计算比值。您可以配置刻度盘,可视化展示数据。

* |
select
  round((errorCount * 100.0 / totalCount), 2) as errorRatio
from
  (
    select
      sum(
        case
          when status >= 400 then 1
          else 0
        end
      ) as errorCount,
      count(1) as totalCount
    from
      log
  )