Lake-mode columnar tables

更新时间:
复制 MD 格式

Lake-mode columnar tables are Lindorm's analytical storage format built on the open Apache Iceberg standard. Data is persisted as columnar files on the instance's underlying file system, and OLAP resource groups and ETL resource groups within the same instance share a single copy of this lake-mode data, enabling cooperative multi-engine reads and writes without data copying.

Lake-mode columnar tables are wire-compatible with the open Iceberg protocol and interoperate with Apache Spark, Apache Flink, and similar engines — a natural fit for a multi-engine, lake-warehouse unified architecture. Compared with warehouse-mode columnar tables, lake-mode columnar tables prioritize data sharing and ecosystem compatibility, while warehouse-mode columnar tables offer a more complete SQL surface and higher ad-hoc query performance.

After connecting to an OLAP resource group, switch to the lake-mode columnar data source:

SET CATALOG lindorm_columnar;

Lake-mode columnar table

When to use

  • Append-only detail analytics — raw logs, behavioral events, instrumentation traces — that never need to be updated.

  • Multi-engine offline data warehouse fact tables — Flink/Spark batch-loads the data, the OLAP resource group runs the analytical queries.

  • Lake-warehouse unified scenarios that need to interoperate with external compute engines such as Apache Spark and Apache Flink.

Data model

Lake-mode columnar tables are built on Apache Iceberg. Data versions are managed through Iceberg's snapshot mechanism, so reads and writes never block each other. Data is organized by partition, and queries skip irrelevant partitions through partition pruning. Multiple engines within the same instance — OLAP resource groups, ETL resource groups, Flink, and Spark — read and write the same underlying data directly, with no migration or synchronization required.

How to use

-- Example: user access log table (partitioned by year and month)
-- Partition columns year and month are placed at the end of the column list
CREATE TABLE access_log (
    user_id     BIGINT               COMMENT "User ID",
    path        VARCHAR(512)         COMMENT "Request path",
    status_code INT                  COMMENT "HTTP status code",
    latency_ms  INT                  COMMENT "Response latency in milliseconds",
    region      VARCHAR(32)          COMMENT "Region",
    year        STRING      NOT NULL COMMENT "Year",
    month       STRING      NOT NULL COMMENT "Month"
)
ENGINE = iceberg
PARTITION BY (year, month);

-- Insert a single row
INSERT INTO access_log VALUES (1001, '/api/v1/query', 200, 32, 'east', '2024', '03');

-- Insert query results into the table (cross-catalog supported)
INSERT INTO access_log (user_id, path, status_code, latency_ms, region, year, month)
SELECT user_id, path, status_code, latency_ms, region, year, month
FROM default_catalog.ods.raw_access_log
WHERE year = '2024' AND month = '03';

-- Query (with partition pruning)
SELECT * FROM access_log WHERE year = '2024' AND month = '03' LIMIT 100;

-- Aggregation
SELECT month, COUNT(*) AS pv, COUNT(DISTINCT user_id) AS uv
FROM access_log
WHERE year = '2024'
GROUP BY month
ORDER BY month;

-- Cross-catalog join (lake-mode columnar joined with warehouse-mode columnar)
SELECT a.user_id, a.path, u.user_level
FROM lindorm_columnar.db01.access_log a
JOIN default_catalog.dw.user_profile u ON a.user_id = u.user_id
WHERE a.year = '2024' AND a.month = '03';

-- BUCKET exact query (primary key tables only): routes the query to specified buckets, skipping file scans on other buckets
-- bucket_id starts from 0 and must not exceed the total number of buckets specified at table creation; multiple buckets can be specified (comma-separated)
SELECT * FROM lindorm_columnar.db01.orders BUCKET(0);
SELECT * FROM lindorm_columnar.db01.orders BUCKET(1, 2, 3);

Caveats

  • Partition columns must be at the end of the column list. Place partition columns after all non-partition columns, otherwise CREATE TABLE fails.

  • No automatic deduplication on write. Inserting the same content multiple times produces multiple independent rows.

  • CREATE TABLE AS SELECT (CTAS) is not supported. Create the table first, then load it with INSERT INTO ... SELECT.

  • TIMESTAMP is not supported. Use DATETIME or DATE for time fields.

  • Cross-engine writes are visible in real time. Data written by an external engine becomes visible to the OLAP resource group immediately, with no extra refresh step.

Data distribution

Lake-mode columnar tables control physical data distribution through the PARTITION BY clause. A well-designed partition layout enables effective partition pruning at query time, significantly reducing the amount of data scanned.

Identity partitioning

Identity partitioning partitions data by the raw value of a column. It works best on low-cardinality discrete columns such as region, year, month, or status. When the WHERE clause includes equality predicates on the partition column, the engine automatically skips unrelated partitions, significantly reducing I/O.

-- Partition by year and month
PARTITION BY (year, month)

-- Partition by region
PARTITION BY (region)
Note

Avoid choosing partition columns with too many distinct values — excessive small partitions inflate metadata pressure and slow query planning.

Data types

Type

Description

BOOLEAN

Boolean value.

INT

32-bit integer.

BIGINT

64-bit integer.

FLOAT

Single-precision floating point.

DOUBLE

Double-precision floating point.

DECIMAL(p, s)

Exact decimal — recommended for financial and transactional data.

DATE

Date in YYYY-MM-DD format.

DATETIME

Date and time.

STRING / VARCHAR(N)

Variable-length string.

BINARY

Binary data.

ARRAY<T>

Array of elements of the same type.

MAP<K, V>

Key-value map.

Compute features

SQL analytical operators

Lake-mode columnar tables support standard SQL analytical syntax, covering the full range of compute operations expected in an OLAP workload.

Projection and filter

SELECT projects the columns you choose. Column names can be aliased with AS and used in arithmetic expressions or function calls. The WHERE clause supports compound predicates:

Predicate type

Example

Equality filter

status_code = 200

Range filter

latency_ms BETWEEN 0 AND 1000, view_date >= '2024-01-01'

Set membership

region IN ('east', 'north'), status_code NOT IN (404, 500)

Pattern match

path LIKE '/api/%'

NULL test

user_id IS NULL, path IS NOT NULL

Regex match

path REGEXP '^/api/v[0-9]+'

Predicates are pushed down to the storage layer whenever possible to cut the volume of data read.

Joins

The following join types are supported:

Join type

Description

INNER JOIN

Returns matching rows from both tables.

LEFT OUTER JOIN

Keeps every row of the left table; right-side columns are NULL when no match exists.

RIGHT OUTER JOIN

Keeps every row of the right table; left-side columns are NULL when no match exists.

FULL OUTER JOIN

Keeps every row of both tables.

CROSS JOIN

Cartesian product.

LEFT SEMI JOIN

Returns rows of the left table that have a match in the right table; right-side columns are not returned.

LEFT ANTI JOIN

Returns rows of the left table that have no match in the right table.

Cross-catalog joins are supported, letting you correlate lake-mode columnar data with warehouse-mode columnar or wide-table engine data:

-- Lake-mode columnar joined with warehouse-mode columnar
SELECT a.user_id, a.path, u.user_level
FROM lindorm_columnar.db01.access_log a
INNER JOIN default_catalog.dw.user_profile u ON a.user_id = u.user_id
WHERE a.year = '2024' AND a.month = '03'
  AND u.user_level IN ('gold', 'platinum');

Set operations

Operator

Description

UNION ALL

Combines results from multiple queries, keeping duplicates — the cheaper option.

UNION

Combines results from multiple queries and removes duplicates.

INTERSECT

Returns rows present in both result sets.

EXCEPT / MINUS

Returns rows present in the first result set but not the second.

Aggregation (GROUP BY)

Standard aggregate functions are supported (COUNT, SUM, AVG, MIN, MAX, COUNT DISTINCT), along with HAVING for second-pass filtering on aggregated results.

The following extended grouping syntaxes compute multiple dimensional rollups in a single scan:

-- GROUPING SETS: explicit list of dimension combinations
SELECT region, month, COUNT(*) AS pv
FROM access_log
GROUP BY GROUPING SETS ((region, month), (region), ());

-- ROLLUP: hierarchical rollup from finest to grand total
SELECT region, month, COUNT(*) AS pv
FROM access_log
GROUP BY ROLLUP (region, month);

-- CUBE: all combinations across the listed dimensions
SELECT region, month, COUNT(*) AS pv
FROM access_log
GROUP BY CUBE (region, month);

Sorting and pagination

ORDER BY sorts on multiple columns (ASC / DESC), and pairs with LIMIT and OFFSET for pagination.

SELECT user_id, path, latency_ms
FROM access_log
WHERE year = '2024' AND month = '03'
ORDER BY latency_ms DESC, user_id ASC
LIMIT 20 OFFSET 40;

Subqueries

The following subquery forms are supported. The optimizer rewrites them as equivalent joins where possible:

Subquery form

Example

Scalar subquery

WHERE latency_ms > (SELECT AVG(latency_ms) FROM access_log)

IN / NOT IN subquery

WHERE user_id IN (SELECT user_id FROM vip_users)

EXISTS / NOT EXISTS subquery

WHERE EXISTS (SELECT 1 FROM blacklist WHERE ...)

Correlated subquery

Subquery references columns from the outer query and is evaluated per outer row.

Window functions

Window functions compute aggregations over a row group while preserving the original rows — a core OLAP capability. Use the OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...) clause to define the window.

Category

Common functions

Ranking

ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n)

Aggregation

SUM(), AVG(), MIN(), MAX() (sliding windows supported)

Offset access

LAG(col, n, default), LEAD(col, n, default)

First / last value

FIRST_VALUE(col), LAST_VALUE(col)

-- Cumulative page views per user, plus monthly access ranking
SELECT
    user_id,
    path,
    year,
    month,
    COUNT(*) OVER (
        PARTITION BY user_id
        ORDER BY month
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_pv,
    RANK() OVER (
        PARTITION BY month
        ORDER BY COUNT(*) DESC
    ) AS monthly_rank
FROM access_log;

-- Latency delta between consecutive requests of the same user
SELECT
    user_id,
    path,
    latency_ms,
    latency_ms - LAG(latency_ms, 1) OVER (PARTITION BY user_id ORDER BY year, month) AS latency_diff
FROM access_log;

CASE WHEN expressions

Both searched and simple CASE WHEN forms are supported, and can be used in SELECT, WHERE, ORDER BY, aggregate functions, and elsewhere.

SELECT
    user_id,
    path,
    latency_ms,
    CASE
        WHEN latency_ms < 100  THEN 'fast'
        WHEN latency_ms < 500  THEN 'normal'
        WHEN latency_ms < 2000 THEN 'slow'
        ELSE 'timeout'
    END AS latency_tier,
    SUM(CASE WHEN status_code = 200 THEN 1 ELSE 0 END) AS success_cnt
FROM access_log
GROUP BY user_id, path, latency_ms;

Metadata access

Because lake-mode columnar tables are built on Iceberg, every table maintains complete metadata — snapshot history, partition state, data file inventory, and more. Append $<metadata-table-name> to a table name to query this metadata directly, with no extra configuration.

Metadata table

Description

table$partitions

Per-partition statistics for the current state — row count, file count, data size, etc.

table$snapshots

All snapshots of the table, including snapshot ID, commit time, and operation type.

table$history

History of metadata changes on the table.

table$manifests

Overview of manifest files associated with each snapshot.

table$files

Path, format, size, and other details for every data file in the current snapshot.

table$refs

References on the table — branches and tags.

-- Per-partition statistics
SELECT * FROM lindorm_columnar.db01.access_log$partitions;

-- Snapshot list (write history)
SELECT * FROM lindorm_columnar.db01.access_log$snapshots
ORDER BY committed_at DESC LIMIT 10;

-- Metadata change history
SELECT * FROM lindorm_columnar.db01.access_log$history;

-- Data file details for the current snapshot
SELECT * FROM lindorm_columnar.db01.access_log$files;