Incremental refresh works best with large source tables, high freshness requirements, and a low rate of data changes per interval. This topic covers table creation, performance tuning, and operational best practices for incremental Dynamic Tables.
Use cases for incremental refresh
A Dynamic Table supports two refresh modes:
-
Full refresh: Recalculates all data each time the table is refreshed.
-
Incremental refresh: Calculates only new or changed data and merges it with the existing full results.
Use incremental refresh when your target SQL query meets the following criteria. These values are general guidelines — always test with your actual workload.
-
The source table has tens of millions of rows or more.
-
Minute-level data freshness is required.
-
Data changes per refresh interval are low, typically 0.1% to 5%.
This refers to the overall data change rate across the entire computation pipeline. A small change in a source table, such as an update to a dimension table, can trigger large-scale recalculation of intermediate or final results.
Table creation and initial refresh
The first refresh of an incremental Dynamic Table processes all historical data and persists intermediate results, consuming significantly more memory than a standard SQL query. For large tables, follow these steps:
-
Disable automatic refresh initially. Run the initial full refresh manually, then add new data and run a second manual refresh to test the incremental process. Enable automatic refresh after verification.
-
Use a serverless resource for the initial refresh. This prevents the refresh from affecting other tasks on your local resource. After the refresh succeeds, check the refresh history for actual resource consumption. If your local resource is sufficient, switch to it for subsequent automatic refreshes.
The following example demonstrates these steps.
-- 1. Define the source table
CREATE TABLE orders (
order_id BIGINT NOT NULL,
user_id BIGINT,
amount FLOAT,
PRIMARY KEY (order_id)
);
-- 2. Generate initial data
INSERT INTO orders
SELECT i, i % 100000, RANDOM() * 1000
FROM GENERATE_SERIES(1, 1000000) AS i;
-- 3. Create an incremental Dynamic Table (automatic refresh is disabled initially)
CREATE DYNAMIC TABLE order_agg
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false', -- Enable automatic refresh after successful manual verification
computing_resource = 'serverless' -- Prioritize using a serverless resource
) AS
SELECT
user_id,
COUNT(1) AS order_count,
SUM(amount) AS amount_sum
FROM orders
GROUP BY user_id;
-- 4. Manually run the first refresh and verify the result
REFRESH TABLE order_agg WITH (refresh_mode = 'incremental');
SELECT * FROM order_agg WHERE user_id = 1;
-- 5. Insert incremental data
INSERT INTO orders
SELECT i, i % 100000, RANDOM() * 1000
FROM GENERATE_SERIES(1000001, 1010001) AS i;
-- 6. Manually run an incremental refresh and verify the result
REFRESH TABLE order_agg WITH (refresh_mode = 'incremental');
SELECT * FROM order_agg WHERE user_id = 1;
-- 7. After verification, enable automatic refresh
ALTER TABLE order_agg SET (auto_refresh_enable = 'true');
GUCs for Dynamic Table refresh
GUC (Grand Unified Configuration) parameters affect incremental Dynamic Table refreshes the same way they affect standard OLAP statements. However, you must apply them using one of the following methods:
-- Configure or reset a GUC parameter for automatic refreshes
ALTER TABLE order_agg SET (refresh_guc_hg_experimental_xxx = xxx);
ALTER TABLE order_agg RESET (refresh_guc_hg_experimental_xxx);
-- Set a GUC parameter for a manual refresh
REFRESH TABLE order_agg WITH (refresh_mode = 'incremental', refresh_guc_hg_experimental_xxx = xxx);
Tuning incremental refresh
Append-only optimization
If a base table has only append writes (no UPDATE or DELETE operations), set the appendonly property to optimize incremental refresh performance. The improvement is especially significant for queries with MIN/MAX aggregations.
After this property is set, the table no longer supports update or delete operations. Set this property before you create the Dynamic Table.
-- Enable Append-Only optimization (must be executed before creating the Dynamic Table)
CALL set_table_property('orders', 'mutate_type', 'appendonly');
-- To restore the default behavior (first delete all associated incremental Dynamic Tables)
CALL set_table_property('orders', 'mutate_type', 'none');
Optimize Group By/Join key order
Incremental Dynamic Tables persist intermediate results for aggregations and joins. During each refresh, the system scans these results using Group By/Join keys to filter files, reducing the amount of intermediate data read.
When you have multiple Group By/Join keys, reorder them in the SQL statement to improve filtering efficiency. Place these fields first:
-
Sequential fields, such as timestamps or auto-incrementing IDs.
-
Fields with high cardinality.
Example: calculate per-minute sales by product category and sales platform for a single day.
-- The minute is a highly sequential time field, so place it first.
-- The product category has higher cardinality than the sales platform, so place it second.
-- The sales platform has only a few values, so place it last.
CREATE TABLE order_details (
order_id BIGINT,
platform BIGINT,
product_id BIGINT,
category_id BIGINT,
amount FLOAT,
create_at TIMESTAMP
);
CREATE DYNAMIC TABLE order_detail_summary
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
SELECT
to_char(create_at, 'HH24:MI') AS minute,
category_id,
platform,
SUM(amount)
FROM order_details
GROUP BY minute, category_id, platform;
Dimension table joins
Use dimension table joins instead of multi-table joins in these scenarios:
-
One table is updated infrequently and not written to in real time.
-
One table is updated frequently, but the join fields do not change.
Example: join an orders table with a customers table using a dimension table join when customers changes infrequently:
CREATE TABLE orders (
order_id BIGINT NOT NULL,
user_id BIGINT,
amount FLOAT,
PRIMARY KEY (order_id)
);
INSERT INTO orders
SELECT i, i % 100000, RANDOM() * 1000
FROM GENERATE_SERIES(1, 1000000) AS i;
CREATE TABLE customers (
user_id BIGINT NOT NULL,
user_name TEXT,
PRIMARY KEY (user_id)
);
INSERT INTO customers
SELECT i, 'user_' || i
FROM GENERATE_SERIES(1, 100000) AS i;
CREATE DYNAMIC TABLE order_customer
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
SELECT
o.order_id,
o.user_id,
c.user_name,
o.amount
FROM orders o
-- FOR SYSTEM_TIME AS OF PROCTIME() identifies this as a dimension table join
LEFT JOIN customers FOR SYSTEM_TIME AS OF PROCTIME() c
ON o.user_id = c.user_id;
In a dimension table join, changes to the dimension table affect only subsequent data — historical results are not retroactively updated. Dimension table join semantics are documented in CREATE DYNAMIC TABLE. To correct historical joined data, run a REFRESH OVERWRITE:
REFRESH OVERWRITE TABLE order_customer WITH (refresh_mode = 'incremental');
In practice, combine dimension table joins with periodic REFRESH OVERWRITE operations to tolerate small amounts of missing dimension data. REFRESH OVERWRITE is resource-intensive — do not run it frequently. If dimension table data timeliness is critical, use a multi-table join instead.
Tune dual-stream joins (multi-table joins)
Avoid small tables in dual-stream joins
Avoid using small tables with frequently changing data in dual-stream joins. A single row in a small table can match many rows in a large table, so changing a few dozen rows can force massive updates to intermediate and final results, negating the benefits of incremental refresh.
If this happens, use a dimension table join when its semantics are acceptable — for example, when the updated fields in the small table are not used in the join. Alternatively, join the small table at query time.
Example: a three-table join of orders, lineitems, and categories.
CREATE TABLE orders (
order_id BIGINT NOT NULL,
user_id BIGINT,
PRIMARY KEY (order_id)
);
INSERT INTO orders SELECT i, i%100000 FROM GENERATE_SERIES(1, 1000000) AS i;
CREATE TABLE lineitems (
lineitem_id BIGINT NOT NULL,
order_id BIGINT,
product_id BIGINT,
category_id BIGINT,
PRIMARY KEY (lineitem_id)
);
INSERT INTO lineitems SELECT i, i%1000000, i%100000, i%1000 FROM GENERATE_SERIES(1, 10000000) AS i;
CREATE TABLE categories (
category_id BIGINT NOT NULL,
category_name TEXT,
PRIMARY KEY (category_id)
);
INSERT INTO categories SELECT i, 'category_' || i FROM GENERATE_SERIES(1, 1000) AS i;
CREATE DYNAMIC TABLE order_detail
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
SELECT
o.order_id,
o.user_id,
l.lineitem_id,
l.product_id,
l.category_id,
c.category_name
FROM orders o
LEFT JOIN lineitems l ON o.order_id = l.order_id
LEFT JOIN categories c ON l.category_id = c.category_id;
The categories table has only 1,000 rows, but if 500 rows change, 5 million related rows in the result table also change — negating the advantage of incremental refresh.
Because the join only uses category_name, which is rarely updated, you can use a dimension table join for categories to prevent this issue.
If the small table data changes frequently and must be reflected in query results, join the small table at query time through a view:
CREATE DYNAMIC TABLE order_detail_dt
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
SELECT
o.order_id,
o.user_id,
l.lineitem_id,
l.product_id,
l.category_id
FROM orders o
LEFT JOIN lineitems l ON o.order_id = l.order_id;
CREATE VIEW order_detail AS
SELECT
dt.*,
c.category_name
FROM order_detail_dt dt
LEFT JOIN categories c ON dt.category_id = c.category_id;
Tune the execution plan
Join execution order significantly impacts performance, and the auto-generated plan is not always optimal. Optimize the order by adjusting your SQL or using a Plan Hint:
-
Multi-table joins: Join small tables first to reduce intermediate state size.
-
Large-small table joins: Make the small table the Build Side and the large table the Probe Side to avoid OOM errors during the initial refresh.
Example: Optimize a three-table join between line items, products, and categories.
Step 1: View and analyze the execution plan
-- 1. Prepare the data
-- lineitems table (large)
CREATE TABLE lineitems (
lineitem_id BIGINT NOT NULL,
order_id BIGINT,
product_id BIGINT,
PRIMARY KEY (lineitem_id)
);
-- products table (medium)
CREATE TABLE products (
product_id BIGINT NOT NULL,
product_name TEXT,
category_id BIGINT,
PRIMARY KEY (product_id)
);
-- categories table (small)
CREATE TABLE categories (
category_id BIGINT NOT NULL,
category_name TEXT,
PRIMARY KEY (category_id)
);
-- 2. Create the Dynamic Table
CREATE DYNAMIC TABLE lineitem_detail
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
SELECT
l.order_id,
l.lineitem_id,
p.product_id,
p.product_name,
c.category_id,
c.category_name
FROM lineitems l
LEFT JOIN products p ON l.product_id = p.product_id
LEFT JOIN categories c ON p.category_id = c.category_id;
-- 3. View the execution plan
EXPLAIN REFRESH TABLE lineitem_detail WITH (REFRESH_MODE = 'incremental');
The execution plan output is as follows:
QUERY PLAN
-> Gather (id=100005, fragmentId=8, dagId=8)
-> Hybrid DML (id=13, fragmentId=7, dagId=7)
-> Redistribution (id=100004, fragmentId=7, dagId=7)
Hash Key: lineitems.lineitem_id, products.product_id, categories.category_id, (row_key_pb_encoding(lineitems.product_id, products.category_id))
-> Project (id=11, fragmentId=5, dagId=5)
-> Incremental Hash Join Node (id=10, fragmentId=5, dagId=5)
Hash Cond: (products.category_id = categories.category_id)
-> Redistribution (id=100003, fragmentId=5, dagId=5)
Hash Key: products.category_id
-> Incremental Hash Join Node (id=6, fragmentId=3, dagId=3)
Hash Cond: (lineitems.product_id = products.product_id)
-> Redistribution (id=100002, fragmentId=3, dagId=3)
Hash Key: lineitems.product_id
-> Local Gather (id=2, fragmentId=1, dagId=1)
-> NiagaraScanNode on public.lineitems (id=1, fragmentId=2, dagId=1)
-> Local Gather (id=5, fragmentId=3, dagId=3)
-> NiagaraScanNode on public.products (id=4, fragmentId=4, dagId=3)
-> Local Gather (id=9, fragmentId=5, dagId=5)
-> NiagaraScanNode on public.categories (id=8, fragmentId=6, dagId=5)
RefreshMode: INCREMENTAL
The execution plan shows the following join order:
-
lineitems(Probe Side) is joined withproducts(Build Side) to compute the intermediate resultlineitem_product. -
lineitem_product(Probe Side) is joined withcategories(Build Side) to compute the final result.
Step 2: Delay the join with the large table
The intermediate result lineitem_product is large, creating high state storage overhead. Instead, join products and categories first to produce a smaller intermediate result and reduce overall cost.
DROP TABLE lineitem_detail;
CREATE DYNAMIC TABLE lineitem_detail_opt
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
WITH pc AS (
SELECT
p.product_id,
p.product_name,
c.category_id,
c.category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
)
SELECT
l.order_id,
l.lineitem_id,
pc.product_id,
pc.product_name,
pc.category_id,
pc.category_name
FROM lineitems l
LEFT JOIN pc ON l.product_id = pc.product_id;
Verify the new execution plan confirms that products and categories are joined first:
EXPLAIN REFRESH TABLE lineitem_detail_opt WITH (REFRESH_MODE = 'incremental');
The execution plan output is as follows:
QUERY PLAN
-> Gather (id=100006, fragmentId=9, dagId=9)
-> Hybrid DML (id=16, fragmentId=8, dagId=8)
-> Redistribution (id=100005, fragmentId=8, dagId=8)
Hash Key: lineitems.lineitem_id, products.product_id, categories.category_id, (row_key_pb_encoding(lineitems.product_id, products.category_id))
-> Project (id=14, fragmentId=7, dagId=7)
-> Project (id=13, fragmentId=7, dagId=7)
-> Incremental Hash Join Node (id=12, fragmentId=7, dagId=7)
Hash Cond: (products.product_id = lineitems.product_id)
-> Redistribution (id=100003, fragmentId=7, dagId=7)
Hash Key: products.product_id
-> Project (id=7, fragmentId=3, dagId=3)
-> Incremental Hash Join Node (id=6, fragmentId=3, dagId=3)
Hash Cond: (products.category_id = categories.category_id)
-> Redistribution (id=100002, fragmentId=3, dagId=3)
Hash Key: products.category_id
-> Local Gather (id=2, fragmentId=1, dagId=1)
-> NiagaraScanNode on public.products (id=1, fragmentId=2, dagId=1)
-> Local Gather (id=5, fragmentId=3, dagId=3)
-> NiagaraScanNode on public.categories (id=4, fragmentId=4, dagId=3)
-> Redistribution (id=100004, fragmentId=7, dagId=7)
Hash Key: lineitems.product_id
-> Local Gather (id=9, fragmentId=5, dagId=5)
-> NiagaraScanNode on public.lineitems (id=8, fragmentId=6, dagId=5)
RefreshMode: INCREMENTAL
Step 3: Use a Plan Hint to specify the Build and Probe sides
In the optimized plan, lineitems is still the Build Side. Using a large table as the Build Side can cause OOM errors. Use a Plan Hint to swap the sides:
DROP TABLE lineitem_detail_opt;
CREATE DYNAMIC TABLE lineitem_detail
WITH (
freshness = '1 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_enable = 'false',
computing_resource = 'serverless'
) AS
WITH pc AS (
SELECT
p.product_id,
p.product_name,
c.category_id,
c.category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
)
SELECT /*+HINT Leading(l pc) */
l.order_id,
l.lineitem_id,
pc.product_id,
pc.product_name,
pc.category_id,
pc.category_name
FROM lineitems l
LEFT JOIN pc ON l.product_id = pc.product_id;
View the final execution plan:
EXPLAIN REFRESH TABLE lineitem_detail WITH (REFRESH_MODE = 'incremental');
The execution plan output is as follows:
QUERY PLAN
Gather (cost=0.00..15.01 rows=4 width=56)
-> Sink Table (cost=0.00..15.01 rows=4 width=56)
-> Redistribution (cost=0.00..15.01 rows=4 width=56)
Hash Key: lineitems.lineitem_id, products.product_id, categories.category_id, (pg_catalog.row_key_pb_encoding(lineitems.product_id, products.category_id))
-> Project (cost=0.00..15.01 rows=4 width=56)
-> Hash Left Join (cost=0.00..15.01 rows=4 width=64)
Hash Cond: (lineitems.product_id = products.product_id)
-> Redistribution (cost=0.00..5.00 rows=1 width=24)
Hash Key: lineitems.product_id
-> Local Gather (cost=0.00..5.00 rows=1 width=24)
-> Seq Scan on lineitems (cost=0.00..5.00 rows=1 width=24)
-> Hash (cost=10.00..10.00 rows=2 width=40)
-> Redistribution (cost=0.00..10.00 rows=2 width=40)
Hash Key: products.product_id
-> Hash Left Join (cost=0.00..10.00 rows=2 width=40)
Hash Cond: (products.category_id = categories.category_id)
-> Redistribution (cost=0.00..5.00 rows=1 width=24)
Hash Key: products.category_id
-> Local Gather (cost=0.00..5.00 rows=1 width=24)
-> Seq Scan on products (cost=0.00..5.00 rows=1 width=24)
-> Hash (cost=5.00..5.00 rows=1 width=16)
-> Local Gather (cost=0.00..5.00 rows=1 width=16)
-> Seq Scan on categories (cost=0.00..5.00 rows=1 width=16)
Optimizer: HQO version 4.1.0
The final plan is ideal: products and categories are joined first to minimize intermediate state, then joined with lineitems as the Probe Side to prevent high memory usage.
Test environment: 32 Compute Units (CUs). Data specifications:
-
lineitems: 100,000,000 rows. -
products: 500,000 rows. -
categories: 10,000 rows.
The test results are as follows:
|
Initial refresh (full) |
Incremental refresh (1% increment) |
State table size |
|||||
|
E2E time |
Memory |
CPU Time |
E2E time |
Memory |
CPU Time |
||
|
Before optimization |
52.8 s |
27 GB |
361 s |
1254 ms |
412 MB |
8404 ms |
5247 MB |
|
After optimization |
49.9 s |
22 GB |
255 s |
911 ms |
389 MB |
6886 ms |
2003 MB |
Modify a Dynamic Table definition online
To modify a Dynamic Table definition without disrupting queries, use ALTER DYNAMIC TABLE table_name AS SQL for minor changes. This overwrites the table during the next refresh with the new definition. This method has restrictions — refer to the ALTER DYNAMIC TABLE documentation for details.
If ALTER DYNAMIC TABLE conditions are not met and no other Dynamic Tables depend on the table, use a view-based rebuild. Example for the orders_agg Dynamic Table:
-- 1. Create a Dynamic Table with the new definition. The name must be different from the old one.
CREATE DYNAMIC TABLE orders_agg_dt
WITH (
...
) AS ...;
-- 2. Wait for the first refresh to complete and confirm that the new Dynamic Table contains data.
SELECT * FROM orders_agg_dt LIMIT 1;
-- 3. In a transaction, create a view to atomically replace the original Dynamic Table.
BEGIN;
DROP TABLE orders_agg;
CREATE VIEW orders_agg AS SELECT * FROM orders_agg_dt;
END;