Data partitioning-Realtime Compute for Apache Flink(Flink)-阿里云帮助中心

Partitioned tables

In Fluss, a partitioned table organizes data based on one or more partition keys. This effectively improves query performance and manageability for large datasets. Partitions divide data into segments, each corresponding to a specific partition key value.

Fluss supports the following three management strategies for partitioned tables:

Manual partition management: You can manually create new partitions or delete existing ones. For more information, see Manage Partitions.
Automatic partition management: The system automatically creates partitions based on the automatic partitioning rules configured when you create the table. It also automatically cleans up expired partitions to prevent unlimited data growth. For more information, see Automatic Partitioning.
Dynamic partition creation: Fluss dynamically creates partitions based on the data written to the table. For more information, see Dynamic Partitioning.

These three strategies are orthogonal and can coexist on the same table.

Multi-field partitioned tables

Partitioned tables, including both primary key tables and log tables, support configuring partition keys based on multiple fields. This approach allows you to partition data by combinations of field values, enabling more granular data organization, management, and query optimization.

For example, in an Order primary key table, you can define the partition key as (date, region). Data is then stored in partitions corresponding to specific combinations, such as date=2025-04-05, region='US'. In streaming queries, you can filter data, for example, by region='US', to leverage partition pruning and improve read performance through partition pushdown.

Key advantages of partitioned tables

Improved query performance: Querying specific partitions reduces the amount of data the system reads, which shortens query execution time.
Clearer data organization: Partitions help organize data logically, making it easier to manage and query.
Better scalability: Dividing large datasets into smaller, more manageable chunks improves system scalability.

Limitations

The partition key type must be STRING.
Tables with a single partition key support both automatic partition creation and cleanup. Tables with multiple partition keys only support automatic cleanup of expired partitions.
If the table is a primary key table, the partition key must be a subset of the primary key.
You can configure automatic partitioning rules only at table creation; they cannot be modified afterward.

Automatic partitioning

Example

You configure automatic partitioning rules using table options. The following example shows how to use Flink SQL to create a table named site_access that supports automatic partitioning.

CREATE TABLE site_access(
  event_day STRING,  -- Partition key
  site_id INT,       -- Site ID
  city_code STRING,  -- City code
  user_name STRING,  -- User name
  pv BIGINT,         -- Page views
  PRIMARY KEY(event_day, site_id) NOT ENFORCED 
) PARTITIONED BY (event_day) WITH (
  'table.auto-partition.enabled' = 'true',            -- Enable automatic partitioning
  'table.auto-partition.time-unit' = 'YEAR',          -- Set the time granularity to YEAR
  'table.auto-partition.num-precreate' = '5',         -- Pre-create 5 partitions
  'table.auto-partition.num-retention' = '2',         -- Retain 2 historical partitions
  'table.auto-partition.time-zone' = 'Asia/Shanghai'  -- Set the time zone to Asia/Shanghai
);

In this example, when an automatic partitioning task runs (Fluss performs this periodically in the background on all tables), the system pre-creates five partitions and retains two historical partitions using YEAR as the time granularity. The time zone is set to Asia/Shanghai.

Table parameters

Parameter	Type	Required	Default	Description
table.auto-partition.enabled	Boolean	No	false	Specifies whether to enable automatic partitioning for the table. The default value is `false`. When enabled, partitions are created automatically.
table.auto-partition.key	String	No	None	For tables with multiple partition keys, this parameter specifies which key is used for time-based automatic partitioning. The system compares this key's value to the current time to create and remove partitions. Required for multi-key tables; not used for single-key tables.
table.auto-partition.time-unit	ENUM	No	DAY	The time granularity for automatically creating partitions. The default value is 'DAY'. Valid values are 'HOUR', 'DAY', 'MONTH', 'QUARTER', and 'YEAR'. If the value is 'HOUR', partitions are created in the yyyyMMddHH format. If 'DAY', the format is yyyyMMdd. If 'MONTH', the format is yyyyMM. If 'QUARTER', the format is yyyyQ. If 'YEAR', the format is yyyy.
table.auto-partition.num-precreate	Integer	No	2	The number of partitions to pre-create during each automatic partitioning check. For example, if the check time is 2024-11-11 and this value is 3, the partitions 20241111, 20241112, and 20241113 are pre-created. If a partition already exists, its creation is skipped. The default value is 2. For example, if 'table.auto-partition.time-unit' is 'DAY' (the default), one pre-created partition is for today and the other is for tomorrow. Pre-creation is not supported for tables with multiple partition keys; this parameter is ignored for them.
table.auto-partition.num-retention	Integer	No	7	The number of historical partitions to retain during each automatic partitioning check. For example, if the check time is 2024-11-11, the time unit is DAY, and this value is 3, the historical partitions 20241108, 20241109, and 20241110 are retained. The system deletes partitions older than 20241108. The default value is 7.
table.auto-partition.time-zone	String	No	System time zone	The time zone for automatic partitioning. Defaults to the system time zone.

Partition generation rules

The table.auto-partition.time-unit for an automatically partitioned table can be set to HOUR, DAY, MONTH, QUARTER, or YEAR. Automatic partitioning creates partitions in the following formats.

Time unit	Format	Example
HOUR	yyyyMMddHH	2024091922
DAY	yyyyMMdd	20240919
MONTH	yyyyMM	202409
QUARTER	yyyyQ	20241
YEAR	yyyy	2024

Fluss cluster configuration

The following table describes the configuration options related to automatic partitioning for a Fluss cluster.

Parameter	Type	Default	Description
auto-partition.check.interval	Duration	10 minutes	The interval at which Fluss checks tables with automatic partitioning enabled. During each check, Fluss creates or deletes partitions as needed based on the table's rules.

Dynamic partitioning

Dynamic partitioning is a client-side feature, enabled by default, that automatically creates partitions as data is written to a table. This is valuable when the full set of partitions is unknown in advance, eliminating the need for manual creation. It is also particularly useful when you work with multi-field partitions because automatic partitioning currently supports partition creation for single-field keys only.

Note

The number of dynamically created partitions is also limited by the max.partition.num and max.bucket.num parameters configured on the Fluss cluster.

The default value of max.partition.num is 1000. This represents the maximum number of partitions that can be created for each partitioned table.
The default value of max.bucket.num is 128000. This represents the maximum total number of buckets supported per table, which is calculated as (number of partitions) × (number of buckets per partition).

Client options

Parameter	Type	Required	Default	Description
client.writer.dynamic-create-partition.enabled	Boolean	No	true	Specifies whether to enable dynamic partition creation for the client writer. If enabled, the client automatically creates a new partition if it does not exist at write time.

Dynamic partitioning example

Step 1: Create a sample table

In the left navigation bar, choose Data Development > Data Query, create a new, blank data query script, and enter the following SQL to create a partitioned table with the dt column as the partition key.

CREATE TABLE IF NOT EXISTS `fluss-demo`.fluss.table_enable_dynamic_partition
(
    id    BIGINT
    ,name STRING
    ,dt   STRING
)
PARTITIONED BY (dt)
WITH (
    'client.writer.dynamic-create-partition.enabled' = 'true'
)
;

Verify the current partitions of the table:

SHOW PARTITIONS `fluss-demo`.fluss.table_enable_dynamic_partition;

The query result indicates that the table's partition list is empty.

Step 2: Dynamically insert data

Run the following SQL statement to insert data into a non-existent partition:

INSERT INTO `fluss-demo`.fluss.table_enable_dynamic_partition values(1, 'hello', '20250701');

Step 3: Verify partition creation

Run the SHOW PARTITIONS command again to verify that the partition has been dynamically created.

The query result shows dt=20250701 in the partition name column, confirming that the partition was dynamically created.