TSQL10分钟快速入门-阿里云帮助中心

我们现在以一个机房性能监控为应用场景，通过一个工具产生的样本数据，来展示如何使用TSQL来完成时序查询。

样本数据

这个样本数据使用了一个时序性能测试工具benchmark（https://github.com/influxdata/influxdb-comparisons）中的样本数据生成工具。在安装并编译这个工具后，你可以使用下面两步生成数据并加载到TSDB引擎中。

生成数据

cd influxdb-comparisons/cmd
bulk_data_gen/bulk_data_gen --seed=123 --use-case=devops --scale-var=10 --format=opentsdb --timestamp-start="2019-03-01T00:00:00Z" --timestamp-end="2019-03-01T00:10:00Z" > tsdb_devops_sf10_10m_seed123.json

下面显示了部分样本数据：

{"metric":"redis.evicted_keys","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2951}
{"metric":"redis.keyspace_hits","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2945}
{"metric":"redis.keyspace_misses","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2944}
{"metric":"redis.instantaneous_ops_per_sec","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":65}
{"metric":"redis.instantaneous_input_kbps","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":58}

加载数据

cat tsdb_devops_sf10_10m_seed123.json | bulk_load_opentsdb/bulk_load_opentsdb --urls=http://your_tsdb_host:port_num -workers=5

查询

时间范围查询：查看一个metric在一个时间段内的所有的列，包括值，时间戳，以及对应的tag key的值。

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'

时间范围查询：查询一个metric在一个时间段内的指定的列，包括值，时间戳，以及具体某几个tag key的值。

select `value`, `timestamp`, hostname,  datacenter
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'

时间范围+tagkey条件查询：查询一个metric在时间段内的值，时间戳，并且hostname满足IN-LIST的条件。

select `value`, `timestamp`, hostname,  datacenter
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
      hostname in ('host_0', 'host_2', 'host_4')

查询结果按时间戳排序：查看一个metric在一个时间段内的值，时间戳，以及对应的tag key的值。

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
order by `timestamp`

含数学计算表达式的值过滤条件查询：查看一个metric在一个时间段内的值，时间戳，以及对应的tag key的值，值满足其平方根>1.5。

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
    sqrt(`value`) > 1.5

分组聚合查询：按照hostname, datacenter来分组，计算每个分组最大值，最小值，平均值。

select 
    hostname, 
    datacenter, 
    max(`value`) as maxV, 
    min(`value`) as minV, 
    avg(`value`) as avgV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
group by hostname, datacenter

分组分时间段聚合查询，按照hostname, datacenter来分组，并且进一步按照2分钟的间隔分组，计算最大值，最小值，平均值。

select 
   hostname, 
   datacenter, 
   tumble(`timestamp`, interval '2' minute) as ts, 
   max(`value`) as maxV, 
   min(`value`) as minV, 
   avg(`value`) as avgV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts

分组分时间段聚合查询, 计算聚合后的表达式的值max(value) - min(value) + 0.5* avg(value)。

select 
    hostname, 
    datacenter, 
    tumble(`timestamp`, interval '2' minute) as ts, 
    max(`value`) - min(`value`) + 0.5* avg(`value`) as compV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts

计算每台机器上，相邻两个时间戳上的记录值之间的差值。

下面的例子使用window function lag(), lag()函数使用一个基于hostname做分组，时间戳排序的窗口frame，返回在同一个窗口内当前记录的前一条记录的值，通过计算两者之差，获取每台主机上相邻时间戳的记录值上的差值。

select hostname, `timestamp`, `value`, 
       `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
from tsdb.`cpu.usage_system` 
  where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'

计算每台机器上，相邻两个时间戳上的记录值之间的差值，如果差值超过异常值，则重设成0。

下面的例子，把上面的查询放在一个子查询中，在子查询之外用了一个case表达式，来表达如果查询中的差值超过50.0（认为是异常值），则重设成0.0。

select hostname, `timestamp`, `value`, 
        case when diff > 50.0 then 0.0
                else diff
        end
from (
    select hostname, `timestamp`, `value`, 
       `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
    from tsdb.`cpu.usage_system` 
      where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
);

计算每台机器上一分钟内的记录的最大值，以及相邻分钟之间最大值的差值。

下面的查询区别于之前差值计算，在于要获取每分钟的最大值的差值。我们可以在子查询中计算每分钟的最大值，在子查询外用窗口函数lag(）计算获得相邻分钟之间最大值的差值。

select hostname, ts, maxValue, 
             maxValue - lag(maxValue) over(partition by hostname order by ts) as diff
from (
  select hostname, 
              tumble(`timestamp`, interval '1' minute) ts, max(`value`) maxValue
  from tsdb.`cpu.usage_system` 
  where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  group by hostname, ts)