降采样查询
降采样查询是在时间维度的聚合查询,是时序数据应用场景常用的降低采样率的查询。本文介绍在时序引擎中降采样查询的语法。
前提条件
时序引擎版本需要3.4.15及以上,升级版本的方法请参见升级小版本。
语法
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_name
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_name | time | function_name '(' field_name [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_name | tag_name ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units
其中time_interval
说明:
interval
:指数值,如5、60等整数,表示时间线聚合的时间窗口的数值。units
:指单位,s代表秒,m代表分,h代表小时,d代表天,n代表月,y代表年。
默认按照时间戳取模对齐,即“对齐时间戳 = 数据时间戳 - (数据时间戳 % interval)”。
支持基于日历时间间隔的降采样。要使用日历对齐,需要在时间单位units后添加一个
c
。例如,1dc
代表从当日零点到次日零点之间的24小时;1nc表示从当月零点到次月零点之间的日历月。
降采样支持的聚合函数列表
SAMPLE BY
是针对每条独立的时间线(关于时间线的概念,请参见数据模型)上的降采样操作。
SAMPLE BY
支持的函数列表:
函数 | 说明 |
SUM | 每个指定时间窗口内求和,具体请参见SUM函数。 |
AVG | 每个指定时间窗口内求均值,具体请参见AVG函数。 |
COUNT | 每个指定时间窗口内值个数,具体请参见COUNT函数。 |
MIN | 每个指定时间窗口内最小值,具体请参见MIN函数。 |
MAX | 每个指定时间窗口内最大值,具体请参见MAX函数。 |
FIRST | 每个指定时间窗口内第一个值,具体请参见FIRST函数。 |
LAST | 每个指定时间窗口内最后值,具体请参见LAST函数。 |
PERCENTILE | 每个指定时间窗口内求百分位,具体请参见PERCENTILE函数。 |
LATEST | 整个时间区间最新值,具体请参见LATEST最新值。 |
RATE | 与前一行对应值的变化率,具体请参见RATE函数。 |
DELTA | 与前一行对应值的差,具体请参见DELTA函数。 |
示例
SELECT查询指定的tag列无需指定降采样函数,其他field列必须指定降采样函数。
假定查询的sensor表数据如下:
SELECT * FROM sensor;
+-----------+----------+---------------------------+-------------+-----------+
| device_id | region | time | temperature | humidity |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260 | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000 | 47.000000 |
| F07A1261 | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000 | 40.000000 |
| F07A1261 | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000 | 41.000000 |
| F07A1261 | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000 | 42.000000 |
| F07A1261 | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000 | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+
降采样与子查询示例
默认UTC对齐降采样:时间线按照8H时间区间聚合分别求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;
+-----------+----------+---------------------------+----------------+
| device_id | region | time | count_humidity |
+-----------+----------+---------------------------+----------------+
| F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 |
| F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 |
+-----------+----------+---------------------------+----------------+
默认UTC对齐降采样,指定窗口offset偏移:时间线按照8H时间区间聚合,开始窗口偏移3h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;
+-----------+----------+---------------------------+----------------+
| device_id | region | time | count_humidity |
+-----------+----------+---------------------------+----------------+
| F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 |
| F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 |
| F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 |
+-----------+----------+---------------------------+----------------+
本地日历按月聚合降采样。
SELECT device_id, time, count(temperature) FROM sensor WHERE device_id='F07A1261' sample by 1nc;
+-----------+---------------------------+--------+
| device_id | time | EXPR$2 |
+-----------+---------------------------+--------+
| F07A1261 | 2021-02-01T00:00:00+08:00 | 1 |
| F07A1261 | 2021-03-01T00:00:00+08:00 | 2 |
| F07A1261 | 2021-05-01T00:00:00+08:00 | 1 |
+-----------+---------------------------+--------+
sample by目前不支持与group by、limit offset、order by配合,但可以配合子查询使用。
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;
+-----------+--------------+
| device_id | max_humidity |
+-----------+--------------+
| F07A1261 | 43.000000 |
| F07A1260 | 47.000000 |
+-----------+--------------+
配合limit offset限制结果条数。
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;
+-----------+----------+--------------+
| device_id | region | avg_humidity |
+-----------+----------+--------------+
| F07A1261 | north-cn | 40.000000 |
+-----------+----------+--------------+
降采样窗口插值示例
固定值插值的用法,示例如下:
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;
+-----------+----------+---------------------------+-----------+
| device_id | region | time | humidity |
+-----------+----------+---------------------------+-----------+
| F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 |
| F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 |
| F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
+-----------+----------+---------------------------+-----------+
后值插值示例如下:
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;
+-----------+----------+---------------------------+-----------+
| device_id | region | time | humidity |
+-----------+----------+---------------------------+-----------+
| F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 |
| F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
+-----------+----------+---------------------------+-----------+
降采样后再转换示例
先按照2h窗口avg降采样,然后对降采样后的数据计算
rate
斜率,示例如下:
SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
+-----------+----------+---------------------------+---------------+
| device_id | region | time | rate_humidity |
+-----------+----------+---------------------------+---------------+
| F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 |
| F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 |
+-----------+----------+---------------------------+---------------+
先按照2h窗口降采样,然后对降采样后的数据计算差值delta,示例如下:
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
+-----------+----------+---------------------------+-----------+
| device_id | region | time | humidity |
+-----------+----------+---------------------------+-----------+
| F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 |
| F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 |
+-----------+----------+---------------------------+-----------+
插值
降采样先把所有时间线按照指定时间窗口切分,并把每个降采样区间内的数据做一次运算,降采样后如果某个精度区间没有值,插值可以指定在这个时间点填充具体的值。比如某条时间线降采样后的时间戳为:t+0, t+20, t+30,此时如果不指定插值,只有3个值,如果指定了插值为1,此时间线会有4个值,其中t+10时刻的值为1。
插值函数表:
Fill Policy | 填充值 |
none | 默认行为,不填值。 |
zero | 固定填入0。 |
linear | 线性填充值。 |
previous | 之前的一个值。 |
near | 邻近的一个值。 |
after | 之后的一个值。 |
fixed | 用指定的一个固定填充值。 |