This topic describes the bucketing policies for Fluss tables.
Bucketing
Bucketing is a data distribution technique that divides table data into multiple data units and distributes them across multiple nodes. This process enables high-performance data storage and computation. When you create a Fluss table, you can set the `'bucket.num'` property to specify the number of buckets. For more information, see Manage tables.
Fluss supports three bucketing policies: hash bucketing, sticky bucketing, and round-robin bucketing.
Primary-key tables can only use hash bucketing.
Log tables use sticky bucketing by default, but can also use the other two bucketing policies.
Hash bucketing
Hash bucketing evenly distributes data across multiple nodes to leverage the benefits of distributed computing. It offers excellent scalability by supporting dynamic adjustments to the number of buckets or the cluster size. This policy is ideal for scenarios that involve processing large amounts of data.
To use this policy, set the 'bucket.key' = 'col1, col2' property for the table to specify the bucketing keys. By default, primary-key tables use the primary keys, excluding partition keys, as the bucketing keys.
Sticky bucketing
Sticky bucketing reduces write latency by writing data to log tables in larger batches. Each time a data batch is sent, the policy automatically switches the target bucket. Over time, data records are gradually and evenly distributed across all buckets. Sticky bucketing is the default policy for log tables. Log tables use the underlying Apache Arrow data format, which is highly efficient at processing large data batches, and this policy maximizes this performance advantage.
To enable this policy, set the 'client.writer.bucket.no-key-assigner'='sticky' property. This policy is not supported for primary-key tables.
Round-robin bucketing
Round-robin bucketing is a simple policy that randomly selects a bucket before each record is written. This policy is suitable for scenarios where data is distributed evenly and has no significant data skew.
To enable this policy, set the 'client.writer.bucket.no-key-assigner'='round_robin' property. This policy is not supported for primary-key tables.