Cluster capacity planning

更新时间:
复制 MD 格式

This document describes how to estimate and plan the resource capacity for a Fluss stream storage cluster based on your business requirements.

Resource assessment

Select a CU specification

Fluss on the public cloud uses standardized compute units (CUs) with a fixed vCPU-to-memory ratio of 1 vCPU to 4 GB. Choose specifications based on your business scale and associated Flink resource size.

CU specification

Configuration

Scenarios

4 CU

4 vCPU / 16 GB

Getting Started/Development: Small-scale testing or development environments.

Low-Throughput Production: Production workloads with low data traffic.

Paired with Flink: Resource size < 200 CU.

8 CU

8 vCPU / 32 GB

General Purpose (Recommended): A standard specification that balances performance and flexibility.

Medium-Scale Production: Capable of handling mainstream business traffic.

Paired with Flink: Resource size from 200 CU to 1,000 CU.

16 CU

16 vCPU / 64 GB

High Performance/Large Storage: For use cases requiring extremely high throughput or a larger per-node storage limit.

Massive-Scale Production: For mission-critical business pipelines.

Paired with Flink: Resource size > 1,000 CU.

Calculate Tablet Server count

Cluster size depends on your total throughput requirements. Use the following formula to estimate the total required CUs to determine the number of nodes:

Required CUs = (Write Throughput / Write Benchmark/CU + Read Throughput / Read Benchmark/CU) × (1 + Redundancy)

Key metrics

  • Throughput unit: We recommend using Rows/s (rows per second) or MB/s (megabytes per second) as a consistent unit for your calculations.

  • Performance benchmark per CU:

    • Write capacity: Approximately 50,000 Rows/s (~46 MB/s).

      Affected by data complexity and primary key update logic.
    • Read capacity: Approximately 50,000 Rows/s (~46 MB/s).

      Affected by column pruning and query filter conditions.
  • Redundancy buffer (Recommended): We recommend reserving a 20% to 30% resource buffer to ensure cluster stability during peak traffic spikes.

Calculation example:
If your calculation requires 64 CU and you choose the 8 CU specification, you need 8 Tablet Server nodes.

Local storage planning

Local storage usage is closely related to the table type (log table or primary key table).

  • Configuration recommendation: We recommend using the default configuration initially.

  • Scaling rules: You can independently scale up disk capacity as your business grows, or scale out storage capacity by adding nodes.

  • Important limitation: Disks can be scaled up, but not down. Plan carefully to avoid over-provisioning.

Sample configurations

Scenario

Write throughput

Read throughput

Table type

Columns

Total CUs

Node configuration

Low-Throughput Stream Processing

250,000 Rows/s

250,000 Rows/s

log table

20

12 CU

3 × 4 CU

Medium-Throughput Real-Time Analytics

500,000 Rows/s

700,000 Rows/s

primary key table

50

32 CU

4 × 8 CU

High-Throughput Real-Time Data Warehouse

2,200,000 Rows/s

2,500,000 Rows/s

primary key table

100

128 CU

8 × 16 CU

Massive-Scale Stream Processing

5,000,000 Rows/s

5,000,000 Rows/s

log table

30

256 CU

16 × 16 CU

Dimension table query service

200,000 Rows/s

300,000 Rows/s

primary key table

30

12 CU

3 × 4 CU

FAQ

Q: What is the difference between a cluster with 10 8-CU nodes and one with 5 16-CU nodes? Both configurations total 80 CU.

A: While the total compute performance is nearly identical, the main differences are storage limits and operational flexibility:

  • Storage limits: Each node has a maximum local disk capacity, for example, 2 TB. A cluster with fewer, larger nodes (16 CU) has less total disk capacity than one with more, smaller nodes (8 CU). If your business generates a large volume of data, more nodes typically mean more total storage space.

  • Scaling granularity: The 8 CU specification offers finer granularity. You can adjust resources in smaller increments, which provides more flexibility and improves cost control.