This topic describes three strategies for managing partitioned tables.
Partitioned tables
In Fluss, a partitioned table organizes data based on one or more partition keys. This effectively improves query performance and manageability for large datasets. Partitions divide data into segments, each corresponding to a specific partition key value.
Fluss supports the following three management strategies for partitioned tables:
-
Manual partition management: You can manually create new partitions or delete existing ones. For more information, see Manage Partitions.
-
Automatic partition management: The system automatically creates partitions based on the automatic partitioning rules configured when you create the table. It also automatically cleans up expired partitions to prevent unlimited data growth. For more information, see Automatic Partitioning.
-
Dynamic partition creation: Fluss dynamically creates partitions based on the data written to the table. For more information, see Dynamic Partitioning.
These three strategies are orthogonal and can coexist on the same table.
Multi-field partitioned tables
Partitioned tables, including both primary key tables and log tables, support configuring partition keys based on multiple fields. This approach allows you to partition data by combinations of field values, enabling more granular data organization, management, and query optimization.
For example, in an Order primary key table, you can define the partition key as (date, region). Data is then stored in partitions corresponding to specific combinations, such as date=2025-04-05, region='US'. In streaming queries, you can filter data, for example, by region='US', to leverage partition pruning and improve read performance through partition pushdown.
Key advantages of partitioned tables
-
Improved query performance: Querying specific partitions reduces the amount of data the system reads, which shortens query execution time.
-
Clearer data organization: Partitions help organize data logically, making it easier to manage and query.
-
Better scalability: Dividing large datasets into smaller, more manageable chunks improves system scalability.
Limitations
-
The partition key type must be STRING.
-
Tables with a single partition key support both automatic partition creation and cleanup. Tables with multiple partition keys only support automatic cleanup of expired partitions.
-
If the table is a primary key table, the partition key must be a subset of the primary key.
-
You can configure automatic partitioning rules only at table creation; they cannot be modified afterward.
Automatic partitioning
Example
You configure automatic partitioning rules using table options. The following example shows how to use Flink SQL to create a table named site_access that supports automatic partitioning.
CREATE TABLE site_access(
event_day STRING, -- Partition key
site_id INT, -- Site ID
city_code STRING, -- City code
user_name STRING, -- User name
pv BIGINT, -- Page views
PRIMARY KEY(event_day, site_id) NOT ENFORCED
) PARTITIONED BY (event_day) WITH (
'table.auto-partition.enabled' = 'true', -- Enable automatic partitioning
'table.auto-partition.time-unit' = 'YEAR', -- Set the time granularity to YEAR
'table.auto-partition.num-precreate' = '5', -- Pre-create 5 partitions
'table.auto-partition.num-retention' = '2', -- Retain 2 historical partitions
'table.auto-partition.time-zone' = 'Asia/Shanghai' -- Set the time zone to Asia/Shanghai
);
In this example, when an automatic partitioning task runs (Fluss performs this periodically in the background on all tables), the system pre-creates five partitions and retains two historical partitions using YEAR as the time granularity. The time zone is set to Asia/Shanghai.
Table parameters
|
Parameter |
Type |
Required |
Default |
Description |
|
table.auto-partition.enabled |
Boolean |
No |
false |
Specifies whether to enable automatic partitioning for the table. The default value is |
|
table.auto-partition.key |
String |
No |
None |
For tables with multiple partition keys, this parameter specifies which key is used for time-based automatic partitioning. The system compares this key's value to the current time to create and remove partitions. Required for multi-key tables; not used for single-key tables. |
|
table.auto-partition.time-unit |
ENUM |
No |
DAY |
The time granularity for automatically creating partitions. The default value is 'DAY'. Valid values are 'HOUR', 'DAY', 'MONTH', 'QUARTER', and 'YEAR'. If the value is 'HOUR', partitions are created in the yyyyMMddHH format. If 'DAY', the format is yyyyMMdd. If 'MONTH', the format is yyyyMM. If 'QUARTER', the format is yyyyQ. If 'YEAR', the format is yyyy. |
|
table.auto-partition.num-precreate |
Integer |
No |
2 |
The number of partitions to pre-create during each automatic partitioning check. For example, if the check time is 2024-11-11 and this value is 3, the partitions 20241111, 20241112, and 20241113 are pre-created. If a partition already exists, its creation is skipped. The default value is 2. For example, if 'table.auto-partition.time-unit' is 'DAY' (the default), one pre-created partition is for today and the other is for tomorrow. Pre-creation is not supported for tables with multiple partition keys; this parameter is ignored for them. |
|
table.auto-partition.num-retention |
Integer |
No |
7 |
The number of historical partitions to retain during each automatic partitioning check. For example, if the check time is 2024-11-11, the time unit is DAY, and this value is 3, the historical partitions 20241108, 20241109, and 20241110 are retained. The system deletes partitions older than 20241108. The default value is 7. |
|
table.auto-partition.time-zone |
String |
No |
System time zone |
The time zone for automatic partitioning. Defaults to the system time zone. |
Partition generation rules
The table.auto-partition.time-unit for an automatically partitioned table can be set to HOUR, DAY, MONTH, QUARTER, or YEAR. Automatic partitioning creates partitions in the following formats.
|
Time unit |
Format |
Example |
|
HOUR |
yyyyMMddHH |
2024091922 |
|
DAY |
yyyyMMdd |
20240919 |
|
MONTH |
yyyyMM |
202409 |
|
QUARTER |
yyyyQ |
20241 |
|
YEAR |
yyyy |
2024 |
Fluss cluster configuration
The following table describes the configuration options related to automatic partitioning for a Fluss cluster.
|
Parameter |
Type |
Default |
Description |
|
auto-partition.check.interval |
Duration |
10 minutes |
The interval at which Fluss checks tables with automatic partitioning enabled. During each check, Fluss creates or deletes partitions as needed based on the table's rules. |
Dynamic partitioning
Dynamic partitioning is a client-side feature, enabled by default, that automatically creates partitions as data is written to a table. This is valuable when the full set of partitions is unknown in advance, eliminating the need for manual creation. It is also particularly useful when you work with multi-field partitions because automatic partitioning currently supports partition creation for single-field keys only.
The number of dynamically created partitions is also limited by the max.partition.num and max.bucket.num parameters configured on the Fluss cluster.
-
The default value of
max.partition.numis 1000. This represents the maximum number of partitions that can be created for each partitioned table. -
The default value of
max.bucket.numis 128000. This represents the maximum total number of buckets supported per table, which is calculated as (number of partitions) × (number of buckets per partition).
Client options
|
Parameter |
Type |
Required |
Default |
Description |
|
client.writer.dynamic-create-partition.enabled |
Boolean |
No |
true |
Specifies whether to enable dynamic partition creation for the client writer. If enabled, the client automatically creates a new partition if it does not exist at write time. |
Dynamic partitioning example
Step 1: Create a sample table
In the left navigation bar, choose , create a new, blank data query script, and enter the following SQL to create a partitioned table with the dt column as the partition key.
CREATE TABLE IF NOT EXISTS `fluss-demo`.fluss.table_enable_dynamic_partition
(
id BIGINT
,name STRING
,dt STRING
)
PARTITIONED BY (dt)
WITH (
'client.writer.dynamic-create-partition.enabled' = 'true'
)
;
Verify the current partitions of the table:
SHOW PARTITIONS `fluss-demo`.fluss.table_enable_dynamic_partition;
The query result indicates that the table's partition list is empty.
Step 2: Dynamically insert data
Run the following SQL statement to insert data into a non-existent partition:
INSERT INTO `fluss-demo`.fluss.table_enable_dynamic_partition values(1, 'hello', '20250701');
Step 3: Verify partition creation
Run the SHOW PARTITIONS command again to verify that the partition has been dynamically created.
The query result shows dt=20250701 in the partition name column, confirming that the partition was dynamically created.