A downsampling query is an aggregate query over time. It is commonly used in time series applications to reduce the sample rate of data.
Engines and versions
Downsampling queries are supported only by LindormTSDB 3.4.15 and later versions.
For more information about how to view and upgrade the LindormTSDB version of the Lindorm instance, see Release notes of LindormTSDB and Minor version update.
Syntax
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval unitsSupported aggregate functions
The SAMPLE BY clause performs a downsampling operation on each individual time series. For more information about time series, see data model.
The SAMPLE BY clause supports the following aggregate functions:
Function | Description |
Calculates the sum of values within each specified time window. | |
Calculates the average of values within each specified time window. | |
Counts the number of values within each specified time window. | |
Finds the minimum value within each specified time window. | |
Finds the maximum value within each specified time window. | |
Returns the first value within each specified time window. | |
Returns the last value within each specified time window. | |
Calculates the percentile of values within each specified time window. | |
Returns the latest value within the entire time range. | |
Calculates the rate of change between the current value and the value in the previous row. | |
Calculates the delta between the current value and the value in the previous row. |
Downsampling queries support filtering only by tag columns, not by field columns. Filters on a field column are ignored and may cause unexpected results. To filter on a field column, use a subquery to apply the filter to the results of the downsampling query.
Take the latest(field, n) function as an example, where temperature is a field column:
SELECT device_id,region,time,temperature FROM (SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') SAMPLE BY 0) WHERE temperature > 45.0;Examples
In a SELECT query, you must specify an aggregate function for each field column, but not for tag columns.
The following examples use a sample table named sensor with the following schema and data:
-- Create the sample table sensor.
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time)
);
-- Insert data into the table.
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 09:00:00',0,9);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 12:01:00',1,45);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 14:03:00',2,46);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 20:00:00',10,47);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-02-10 12:00:30',3,40);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-01 12:01:00',4,41);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-08 12:08:00',5,42);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-05-01 13:00:00',6,43);Downsampling and subquery examples
You cannot nest a subquery within a downsampling query, but you can use a downsampling query as a subquery.
Example 1: Perform a default UTC-aligned downsampling. This query aggregates each time series into 8-hour time windows and calculates the count for each window.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+Example 2: Perform a UTC-aligned downsampling with an offset. This query aggregates each time series into 8-hour time windows, offsets the start of each window by 3 hours, and calculates the count.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+Example 3: Perform a UTC-aligned downsampling to align with the midnight of a local time zone. This query aggregates each time series into 24-hour time windows and offsets the start by 16 hours to align with 00:00 in the UTC+08:00 time zone. Then, the query calculates the count.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16hThe following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+Example 4: The
SAMPLE BYclause cannot be used directly withGROUP BY,LIMIT OFFSET, orORDER BY. To use these clauses, wrap theSAMPLE BYquery in a subquery.SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;The following result is returned:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+Example 5: Use
LIMIT OFFSETin an outer query to limit the number of results from a downsampling subquery.SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;The following result is returned:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
Downsampling with interpolation examples
Example 1: Interpolate data with a fixed value.
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+Example 2: Interpolate data with the value from the following time window.
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
Post-downsampling transformation examples
Example 1: Downsample the data into 2-hour windows by using the
AVG()function. Then, calculate therateof change on the downsampled results.SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;The following result is returned:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+Example 2: Downsample the data into 2-hour windows. Then, calculate the
deltaof the downsampled results.SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
Interpolation
Downsampling first divides each time series into specified time windows and applies an aggregate function to the data points within each window. If a time window contains no data points, the query returns no result for that interval. Use interpolation to fill these gaps with a generated value. For example, a downsampling query generates results at timestamps t+0, t+20, and t+30. If the time window is 10 units, a gap exists at t+10. If you specify a fill policy for interpolation, the query can generate a value at t+10.
The following table describes the available fill policies for interpolation.
Fill policy | Interpolated value |
none | Does not fill empty time windows (default). |
zero | Fills empty time windows with 0. |
linear | Performs linear interpolation to fill empty time windows. |
previous | Fills empty time windows with the value from the preceding time window. |
near | Fills empty time windows with the value from the nearest time window. |
after | Fills empty time windows with the value from the following time window. |
fixed | Fills empty time windows with a specified fixed value. |