降采样查询是在时间维度的聚合查询,是时序数据应用场景常用的降低采样率的查询。
引擎与版本
降采样查询仅支持时序引擎,且引擎版本需为3.4.15及以上版本。
语法
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units
降采样支持的聚合函数列表
SAMPLE BY
是针对每条独立的时间线(关于时间线的概念,请参见数据模型)上的降采样操作。
SAMPLE BY
支持的函数列表:
函数 | 说明 |
SUM | 每个指定时间窗口内求和,具体请参见SUM函数。 |
AVG | 每个指定时间窗口内求均值,具体请参见AVG函数。 |
COUNT | 每个指定时间窗口内值个数,具体请参见COUNT函数。 |
MIN | 每个指定时间窗口内最小值,具体请参见MIN函数。 |
MAX | 每个指定时间窗口内最大值,具体请参见MAX函数。 |
FIRST | 每个指定时间窗口内第一个值,具体请参见FIRST函数。 |
LAST | 每个指定时间窗口内最后值,具体请参见LAST函数。 |
PERCENTILE | 每个指定时间窗口内求百分位,具体请参见PERCENTILE函数。 |
LATEST | 整个时间区间最新值,具体请参见LATEST函数。 |
RATE | 与前一行对应值的变化率,具体请参见RATE函数。 |
DELTA | 与前一行对应值的差,具体请参见DELTA函数。 |
示例
SELECT查询指定的tag列无需指定降采样函数,其他field列必须指定降采样函数。
假定查询的sensor表数据如下:
SELECT * FROM sensor;
返回结果如下:
+-----------+----------+---------------------------+-------------+-----------+
| device_id | region | time | temperature | humidity |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260 | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000 | 47.000000 |
| F07A1261 | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000 | 40.000000 |
| F07A1261 | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000 | 41.000000 |
| F07A1261 | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000 | 42.000000 |
| F07A1261 | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000 | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+
降采样与子查询示例
降采样查询中不支持嵌套子查询,但可以作为子查询被其他查询嵌套。
示例1:默认UTC对齐降采样,时间线按照8h时间区间聚合分别求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例2:默认UTC对齐降采样,指定窗口offset偏移,时间线按照8h时间区间聚合,开始窗口偏移3h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例3: 默认UTC对齐降采样,对齐到当地时间0点(例如东8区时间),时间线按照24h时间区间聚合,开始窗口偏移16h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+
示例4:sample by目前不支持与group by、limit offset、order by配合,但可以配合子查询使用。
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;
返回结果如下:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+
示例5:配合limit offset限制结果条数。
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;
返回结果如下:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
降采样窗口插值示例
示例1:固定值插值。
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
示例2:后值插值。
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
降采样后再转换示例
示例1:先按照2h窗口avg降采样,然后对降采样后的数据计算
rate
斜率。SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回结果如下:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+
示例2:先按照2h窗口降采样,然后对降采样后的数据计算差值delta。
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
插值
降采样先把所有时间线按照指定时间窗口切分,并把每个降采样区间内的数据做一次运算,降采样后如果某个精度区间没有值,插值可以指定在这个时间点填充具体的值。比如某条时间线降采样后的时间戳为:t+0, t+20, t+30,此时如果不指定插值,只有3个值,如果指定了插值为1,此时间线会有4个值,其中t+10时刻的值为1。
插值函数表:
Fill Policy | 填充值 |
none | 默认行为,不填值。 |
zero | 固定填入0。 |
linear | 线性填充值。 |
previous | 之前的一个值。 |
near | 邻近的一个值。 |
after | 之后的一个值。 |
fixed | 用指定的一个固定填充值。 |