Log tables

更新时间:
复制 MD 格式

This topic describes log tables in Fluss.

Terms

A log table is a type of table in Fluss that stores data in the order it is written. Log tables support only append operations and do not support update or delete operations. Log tables are typically used to store high-throughput log data, similar to a typical use case for Apache Kafka.

You can create a log table by omitting the PRIMARY KEY clause in a CREATE TABLE statement. For example, the following Flink SQL statement creates a log table with 3 buckets.

CREATE TABLE log_table (
  order_id BIGINT,
  item_id BIGINT,
  amount INT,
  address STRING
)
WITH ('bucket.num' = '3')
;

The 'bucket.num' parameter must be an integer greater than 0. If you do not specify this parameter, the system uses the default cluster value of 1 for the number of buckets.

Bucket allocation policies

Bucketing is the fundamental unit for parallelization and scalability in Fluss. Tables are divided into multiple buckets, and each bucket is the smallest unit for read and write operations. When you write a record to a log table, Fluss uses a bucket allocation policy to assign the record to a specific bucket. Fluss provides the following three bucket allocation policies:

  • Sticky policy (default)

    Writes records continuously to a randomly selected bucket until that bucket reaches its configured capacity limit, such as a limit on size or record count.

  • Round-robin policy

    Selects a target bucket in a round-robin fashion before writing each record.

  • Hash-based bucketing

    Assigns records based on the hash value of specified bucket keys.

For more information, see Data bucketing.

Local storage policy

  • Parameter settings: You can use the table.log.ttl parameter to define the retention period for local log data.

  • Tiering mechanism: The system continuously monitors the data lifecycle. Cold data that exceeds the time to live (TTL) threshold is automatically offloaded from the local disk and archived to remote storage. This mechanism ensures that only active, hot data is stored on the local disk. This effectively controls storage overhead.

Data consumption

Log tables in Fluss support real-time data consumption and strictly maintain the write order of data within each bucket. The rules are as follows:

  • For two records in the same bucket of the same table, the record written first is consumed first.

  • For two records in different buckets of the same table, the consumption order is not guaranteed. This is because different buckets can be processed in parallel by different data consumption tasks.