Downsampling queries

更新时间:
复制 MD 格式

A downsampling query is an aggregate query over time. It is commonly used in time series applications to reduce the sample rate of data.

Engines and versions

Downsampling queries are supported only by LindormTSDB 3.4.15 and later versions.

Note

For more information about how to view and upgrade the LindormTSDB version of the Lindorm instance, see Release notes of LindormTSDB and Minor version update.

Syntax

select_sample_by_statement ::=  SELECT ( select_clause | '*' )
                                FROM table_identifier
                                WHERE where_clause
                                SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause              ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector                   ::=  tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause               ::=  relation ( AND relation )* 
relation                   ::=  ( field_identifier | tag_identifier ) operator term
operator                   ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval              ::= interval units | 0
offset_interval            ::= interval units

Supported aggregate functions

The SAMPLE BY clause performs a downsampling operation on each individual time series. For more information about time series, see data model.

The SAMPLE BY clause supports the following aggregate functions:

Function

Description

SUM

Calculates the sum of values within each specified time window.

AVG

Calculates the average of values within each specified time window.

COUNT

Counts the number of values within each specified time window.

MIN

Finds the minimum value within each specified time window.

MAX

Finds the maximum value within each specified time window.

FIRST

Returns the first value within each specified time window.

LAST

Returns the last value within each specified time window.

PERCENTILE

Calculates the percentile of values within each specified time window.

LATEST

Returns the latest value within the entire time range.

RATE

Calculates the rate of change between the current value and the value in the previous row.

DELTA

Calculates the delta between the current value and the value in the previous row.

Downsampling queries support filtering only by tag columns, not by field columns. Filters on a field column are ignored and may cause unexpected results. To filter on a field column, use a subquery to apply the filter to the results of the downsampling query.

Take the latest(field, n)  function as an example, where temperature is a field column:

SELECT device_id,region,time,temperature FROM (SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') SAMPLE BY 0) WHERE temperature > 45.0;

Examples

In a SELECT query, you must specify an aggregate function for each field column, but not for tag columns.

The following examples use a sample table named sensor with the following schema and data:

-- Create the sample table sensor.
CREATE TABLE sensor (
    device_id VARCHAR NOT NULL,
    region VARCHAR NOT NULL,
    time TIMESTAMP NOT NULL,
    temperature DOUBLE,
    humidity BIGINT,
    PRIMARY KEY(device_id, region, time)
);

-- Insert data into the table.
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 09:00:00',0,9);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 12:01:00',1,45);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 14:03:00',2,46);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 20:00:00',10,47);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-02-10 12:00:30',3,40);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-01 12:01:00',4,41);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-08 12:08:00',5,42);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-05-01 13:00:00',6,43);

Downsampling and subquery examples

Note

You cannot nest a subquery within a downsampling query, but you can use a downsampling query as a subquery.

  • Example 1: Perform a default UTC-aligned downsampling. This query aggregates each time series into 8-hour time windows and calculates the count for each window.

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 3              |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • Example 2: Perform a UTC-aligned downsampling with an offset. This query aggregates each time series into 8-hour time windows, offsets the start of each window by 3 hours, and calculates the count.

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T03:00:00+08:00 | 1              |
    | F07A1260  | north-cn | 2021-01-01T11:00:00+08:00 | 2              |
    | F07A1260  | north-cn | 2021-01-01T19:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • Example 3: Perform a UTC-aligned downsampling to align with the midnight of a local time zone. This query aggregates each time series into 24-hour time windows and offsets the start by 16 hours to align with 00:00 in the UTC+08:00 time zone. Then, the query calculates the count.

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id | region   | time                      | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T00:00:00+08:00 | 4              |
    +-----------+----------+---------------------------+----------------+
  • Example 4: The SAMPLE BY clause cannot be used directly with GROUP BY, LIMIT OFFSET, or ORDER BY. To use these clauses, wrap the SAMPLE BY query in a subquery.

    SELECT device_id, max(avg_humidity) AS max_humidity  FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;

    The following result is returned:

    +-----------+--------------+
    | device_id | max_humidity |
    +-----------+--------------+
    | F07A1261  | 43.000000    |
    | F07A1260  | 47.000000    |
    +-----------+--------------+
  • Example 5: Use LIMIT OFFSET in an outer query to limit the number of results from a downsampling subquery.

    SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;

    The following result is returned:

    +-----------+----------+--------------+
    | device_id |  region  | avg_humidity |
    +-----------+----------+--------------+
    | F07A1261  | north-cn | 40.000000    |
    +-----------+----------+--------------+

Downsampling with interpolation examples

  • Example 1: Interpolate data with a fixed value.

    SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+
  • Example 2: Interpolate data with the value from the following time window.

    SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+

Post-downsampling transformation examples

  • Example 1: Downsample the data into 2-hour windows by using the AVG() function. Then, calculate the rate of change on the downsampled results.

    SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    The following result is returned:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | rate_humidity |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500      |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139      |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046      |
    +-----------+----------+---------------------------+---------------+
  • Example 2: Downsample the data into 2-hour windows. Then, calculate the delta of the downsampled results.

    SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000  |
    +-----------+----------+---------------------------+-----------+

Interpolation

Downsampling first divides each time series into specified time windows and applies an aggregate function to the data points within each window. If a time window contains no data points, the query returns no result for that interval. Use interpolation to fill these gaps with a generated value. For example, a downsampling query generates results at timestamps t+0, t+20, and t+30. If the time window is 10 units, a gap exists at t+10. If you specify a fill policy for interpolation, the query can generate a value at t+10.

The following table describes the available fill policies for interpolation.

Fill policy

Interpolated value

none

Does not fill empty time windows (default).

zero

Fills empty time windows with 0.

linear

Performs linear interpolation to fill empty time windows.

previous

Fills empty time windows with the value from the preceding time window.

near

Fills empty time windows with the value from the nearest time window.

after

Fills empty time windows with the value from the following time window.

fixed

Fills empty time windows with a specified fixed value.