降采样查询

降采样查询是在时间维度的聚合查询,是时序数据应用场景常用的降低采样率的查询。本文介绍在时序引擎中降采样查询的语法。

前提条件

时序引擎版本需要3.4.15及以上,升级版本的方法请参见升级小版本

语法

select_sample_by_statement ::=  SELECT ( select_clause | '*' )
                                FROM table_name
                                WHERE where_clause
                                SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause              ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector                   ::=  tag_name | time | function_name '(' field_name [ ',' function_args ] ')'
where_clause               ::=  relation ( AND relation )* (OR relation)*
relation                   ::=  ( field_name | tag_name ) operator term
operator                   ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval              ::= interval units | 0
offset_interval            ::= interval units

其中time_interval说明:

  • interval:指数值,如5、60等整数,表示时间线聚合的时间窗口的数值。

  • units:指单位,s代表秒,m代表分,h代表小时,d代表天,n代表月,y代表年。

说明

  • 默认按照时间戳取模对齐,即“对齐时间戳 = 数据时间戳 - (数据时间戳 % interval)”。

  • 支持基于日历时间间隔的降采样。要使用日历对齐,需要在时间单位units后添加一个c。例如,1dc代表从当日零点到次日零点之间的24小时;1nc表示从当月零点到次月零点之间的日历月。

降采样支持的聚合函数列表

SAMPLE BY是针对每条独立的时间线(关于时间线的概念,请参见数据模型)上的降采样操作。

SAMPLE BY支持的函数列表:

函数

说明

SUM

每个指定时间窗口内求和,具体请参见SUM函数

AVG

每个指定时间窗口内求均值,具体请参见AVG函数

COUNT

每个指定时间窗口内值个数,具体请参见COUNT函数

MIN

每个指定时间窗口内最小值,具体请参见MIN函数

MAX

每个指定时间窗口内最大值,具体请参见MAX函数

FIRST

每个指定时间窗口内第一个值,具体请参见FIRST函数

LAST

每个指定时间窗口内最后值,具体请参见LAST函数

PERCENTILE

每个指定时间窗口内求百分位,具体请参见PERCENTILE函数

LATEST

整个时间区间最新值,具体请参见LATEST最新值

RATE

与前一行对应值的变化率,具体请参见RATE函数

DELTA

与前一行对应值的差,具体请参见DELTA函数

示例

SELECT查询指定的tag列无需指定降采样函数,其他field列必须指定降采样函数。

假定查询的sensor表数据如下:

SELECT * FROM sensor;

+-----------+----------+---------------------------+-------------+-----------+
| device_id |  region  |           time            | temperature | humidity  |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260  | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000    | 9.000000  |
| F07A1260  | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000    | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000    | 46.000000 |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000   | 47.000000 |
| F07A1261  | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000    | 40.000000 |
| F07A1261  | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000    | 41.000000 |
| F07A1261  | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000    | 42.000000 |
| F07A1261  | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000    | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+

降采样与子查询示例

  • 默认UTC对齐降采样:时间线按照8H时间区间聚合分别求count。

SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;

+-----------+----------+---------------------------+----------------+
| device_id |  region  |           time            | count_humidity |
+-----------+----------+---------------------------+----------------+
| F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 3              |
| F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1              |
+-----------+----------+---------------------------+----------------+

  • 默认UTC对齐降采样,指定窗口offset偏移:时间线按照8H时间区间聚合,开始窗口偏移3h,求count。

SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;

+-----------+----------+---------------------------+----------------+
| device_id |  region  |           time            | count_humidity |
+-----------+----------+---------------------------+----------------+
| F07A1260  | north-cn | 2021-01-01T03:00:00+08:00 | 1              |
| F07A1260  | north-cn | 2021-01-01T11:00:00+08:00 | 2              |
| F07A1260  | north-cn | 2021-01-01T19:00:00+08:00 | 1              |
+-----------+----------+---------------------------+----------------+

  • 本地日历按月聚合降采样。

SELECT device_id, time, count(temperature) FROM sensor WHERE device_id='F07A1261' sample by 1nc;

+-----------+---------------------------+--------+
| device_id |           time            | EXPR$2 |
+-----------+---------------------------+--------+
| F07A1261  | 2021-02-01T00:00:00+08:00 | 1      |
| F07A1261  | 2021-03-01T00:00:00+08:00 | 2      |
| F07A1261  | 2021-05-01T00:00:00+08:00 | 1      |
+-----------+---------------------------+--------+

  • sample by目前不支持与group by、limit offset、order by配合,但可以配合子查询使用。

SELECT device_id, max(avg_humidity) AS max_humidity  FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;

+-----------+--------------+
| device_id | max_humidity |
+-----------+--------------+
| F07A1261  | 43.000000    |
| F07A1260  | 47.000000    |
+-----------+--------------+

  • 配合limit offset限制结果条数。

SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;

+-----------+----------+--------------+
| device_id |  region  | avg_humidity |
+-----------+----------+--------------+
| F07A1261  | north-cn | 40.000000    |
+-----------+----------+--------------+

降采样窗口插值示例

  • 固定值插值的用法,示例如下:

SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;

+-----------+----------+---------------------------+-----------+
| device_id |  region  |           time            | humidity  |
+-----------+----------+---------------------------+-----------+
| F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
| F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000  |
| F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
| F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000  |
| F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000  |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
+-----------+----------+---------------------------+-----------+
  • 后值插值示例如下:

SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;

+-----------+----------+---------------------------+-----------+
| device_id |  region  |           time            | humidity  |
+-----------+----------+---------------------------+-----------+
| F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
| F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
| F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 |
| F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
+-----------+----------+---------------------------+-----------+

降采样后再转换示例

  • 先按照2h窗口avg降采样,然后对降采样后的数据计算rate斜率,示例如下:

SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

+-----------+----------+---------------------------+---------------+
| device_id |  region  |           time            | rate_humidity |
+-----------+----------+---------------------------+---------------+
| F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500      |
| F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139      |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046      |
+-----------+----------+---------------------------+---------------+

  • 先按照2h窗口降采样,然后对降采样后的数据计算差值delta,示例如下:

SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

+-----------+----------+---------------------------+-----------+
| device_id |  region  |           time            | humidity  |
+-----------+----------+---------------------------+-----------+
| F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 |
| F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000  |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000  |
+-----------+----------+---------------------------+-----------+

插值

降采样先把所有时间线按照指定时间窗口切分,并把每个降采样区间内的数据做一次运算,降采样后如果某个精度区间没有值,插值可以指定在这个时间点填充具体的值。比如某条时间线降采样后的时间戳为:t+0, t+20, t+30,此时如果不指定插值,只有3个值,如果指定了插值为1,此时间线会有4个值,其中t+10时刻的值为1。

插值函数表:

Fill Policy

填充值

none

默认行为,不填值。

zero

固定填入0。

linear

线性填充值。

previous

之前的一个值。

near

邻近的一个值。

after

之后的一个值。

fixed

用指定的一个固定填充值。

阿里云首页 云原生多模数据库Lindorm 相关技术圈