降采样查询是在时间维度的聚合查询,是时序数据应用场景常用的降低采样率的查询。
引擎与版本
降采样查询仅支持时序引擎,且引擎版本需为3.4.15及以上版本。
语法
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units
降采样支持的聚合函数列表
SAMPLE BY
是针对每条独立的时间线(关于时间线的概念,请参见数据模型)上的降采样操作。
SAMPLE BY
支持的函数列表:
函数 | 说明 |
每个指定时间窗口内求和。 | |
每个指定时间窗口内求均值。 | |
每个指定时间窗口内值个数。 | |
每个指定时间窗口内最小值。 | |
每个指定时间窗口内最大值。 | |
每个指定时间窗口内第一个值。 | |
每个指定时间窗口内最后值。 | |
每个指定时间窗口内求百分位。 | |
整个时间区间最新值。 | |
与前一行对应值的变化率。 | |
与前一行对应值的差。 |
在使用涉及条数限制的降采样聚合函数(如MIN、FIRST、LATEST等)时,如果按照field列进行过滤,则过滤操作会作用于降采样后的结果集而非原始数据本身,这可能导致实际返回的数据条数与预期不一致。因此,建议您按照tag列进行过滤。
以LATEST函数latest(field, n)
为例,其中temperature
为field列:
SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') and temperature > 45.0 SAMPLE BY 0;
上述语句等价于:
SELECT device_id,region,time,temperature FROM (SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') SAMPLE BY 0) WHERE temperature > 45.0;
示例
SELECT查询指定的tag列无需指定降采样函数,其他field列必须指定降采样函数。
假设示例表sensor
的结构及表中数据如下:
-- 创建示例表sensor
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time)
);
-- 插入数据
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 09:00:00',0,9);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 12:01:00',1,45);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 14:03:00',2,46);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 20:00:00',10,47);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-02-10 12:00:30',3,40);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-01 12:01:00',4,41);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-08 12:08:00',5,42);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-05-01 13:00:00',6,43);
降采样与子查询示例
降采样查询中不支持嵌套子查询,但可以作为子查询被其他查询嵌套。
示例1:默认UTC对齐降采样,时间线按照8h时间区间聚合分别求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例2:默认UTC对齐降采样,指定窗口offset偏移,时间线按照8h时间区间聚合,开始窗口偏移3h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例3: 默认UTC对齐降采样,对齐到当地时间0点(例如东8区时间),时间线按照24h时间区间聚合,开始窗口偏移16h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h
返回结果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+
示例4:sample by目前不支持与group by、limit offset、order by配合,但可以配合子查询使用。
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;
返回结果如下:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+
示例5:配合limit offset限制结果条数。
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;
返回结果如下:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
降采样窗口插值示例
示例1:固定值插值。
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
示例2:后值插值。
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
降采样后再转换示例
示例1:先按照2h窗口avg降采样,然后对降采样后的数据计算
rate
斜率。SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回结果如下:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+
示例2:先按照2h窗口降采样,然后对降采样后的数据计算差值delta。
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回结果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
插值
降采样先把所有时间线按照指定时间窗口切分,并把每个降采样区间内的数据做一次运算,降采样后如果某个精度区间没有值,插值可以指定在这个时间点填充具体的值。比如某条时间线降采样后的时间戳为:t+0, t+20, t+30,此时如果不指定插值,只有3个值,如果指定了插值为1,此时间线会有4个值,其中t+10时刻的值为1。
插值函数表:
Fill Policy | 填充值 |
none | 默认行为,不填值。 |
zero | 固定填入0。 |
linear | 线性填充值。 |
previous | 之前的一个值。 |
near | 邻近的一个值。 |
after | 之后的一个值。 |
fixed | 用指定的一个固定填充值。 |