FirstRow Merge Engine

更新时间:
复制 MD 格式

This topic describes how to use the FirstRow Merge Engine.

FirstRow Merge Engine description

  • Merge policy

    For each primary key, only the first record is retained. Subsequent records with the same primary key are discarded.

  • Configuration

    You can enable this engine by setting the table property 'table.merge-engine' = 'first_row' when you create a table.

  • Changelog attributes

    • Generates an insert-only changelog.

    • Allows downstream Flink jobs to treat the primary key table as an append-only log table.

  • Scenarios

    • Suitable for downstream operations that do not require retractions or changelogs, such as window aggregation and interval join.

    • In stream processing, you can use this engine to remove duplicates from logs. This reduces processing complexity and improves efficiency.

  • Limits

    • Does not support UPDATE or DELETE operations.

    • Does not support partial updates.

    • UPDATE_BEFORE and DELETE events in the changelog are automatically ignored.

Example

-- Create table T, set the primary key to k, and enable the FirstRow Merge Engine.
CREATE TABLE T (
    k  INT,
    v1 DOUBLE,
    v2 STRING,
    PRIMARY KEY (k) NOT ENFORCED
) WITH (
    'table.merge-engine' = 'first_row'
);

-- Insert two records with the same primary key.
INSERT INTO T VALUES (1, 2.0, 't1');
INSERT INTO T VALUES (1, 3.0, 't2');

-- Query the record where the primary key is 1. Only the first record is returned.
SELECT * FROM T WHERE k = 1;

 -- Output
-- +---+-----+------+
-- | k | v1  | v2   |
-- +---+-----+------+
-- | 1 | 2.0 | t1   |
-- +---+-----+------+