Cluster capacity planning-Realtime Compute for Apache Flink(Flink)-阿里云帮助中心

Resource assessment

Select a CU specification

Fluss on the public cloud uses standardized compute units (CUs) with a fixed vCPU-to-memory ratio of 1 vCPU to 4 GB. Choose specifications based on your business scale and associated Flink resource size.

CU specification	Configuration	Scenarios
4 CU	4 vCPU / 16 GB	Getting Started/Development: Small-scale testing or development environments. Low-Throughput Production: Production workloads with low data traffic. Paired with Flink: Resource size < 200 CU.
8 CU	8 vCPU / 32 GB	General Purpose (Recommended): A standard specification that balances performance and flexibility. Medium-Scale Production: Capable of handling mainstream business traffic. Paired with Flink: Resource size from 200 CU to 1,000 CU.
16 CU	16 vCPU / 64 GB	High Performance/Large Storage: For use cases requiring extremely high throughput or a larger per-node storage limit. Massive-Scale Production: For mission-critical business pipelines. Paired with Flink: Resource size > 1,000 CU.

Calculate Tablet Server count

Cluster size depends on your total throughput requirements. Use the following formula to estimate the total required CUs to determine the number of nodes:

Required CUs = (Write Throughput / Write Benchmark/CU + Read Throughput / Read Benchmark/CU) × (1 + Redundancy)

Key metrics

Throughput unit: We recommend using Rows/s (rows per second) or MB/s (megabytes per second) as a consistent unit for your calculations.
Performance benchmark per CU:
- Write capacity: Approximately 50,000 Rows/s (~46 MB/s).
  
  Affected by data complexity and primary key update logic.
- Read capacity: Approximately 50,000 Rows/s (~46 MB/s).
  
  Affected by column pruning and query filter conditions.
Redundancy buffer (Recommended): We recommend reserving a 20% to 30% resource buffer to ensure cluster stability during peak traffic spikes.

Calculation example:
If your calculation requires 64 CU and you choose the 8 CU specification, you need 8 Tablet Server nodes.

Local storage planning

Local storage usage is closely related to the table type (log table or primary key table).

Configuration recommendation: We recommend using the default configuration initially.
Scaling rules: You can independently scale up disk capacity as your business grows, or scale out storage capacity by adding nodes.
Important limitation: Disks can be scaled up, but not down. Plan carefully to avoid over-provisioning.

Sample configurations

Scenario	Write throughput	Read throughput	Table type	Columns	Total CUs	Node configuration
Low-Throughput Stream Processing	250,000 Rows/s	250,000 Rows/s	log table	20	12 CU	3 × 4 CU
Medium-Throughput Real-Time Analytics	500,000 Rows/s	700,000 Rows/s	primary key table	50	32 CU	4 × 8 CU
High-Throughput Real-Time Data Warehouse	2,200,000 Rows/s	2,500,000 Rows/s	primary key table	100	128 CU	8 × 16 CU
Massive-Scale Stream Processing	5,000,000 Rows/s	5,000,000 Rows/s	log table	30	256 CU	16 × 16 CU
Dimension table query service	200,000 Rows/s	300,000 Rows/s	primary key table	30	12 CU	3 × 4 CU

FAQ

Q: What is the difference between a cluster with 10 8-CU nodes and one with 5 16-CU nodes? Both configurations total 80 CU.

A: While the total compute performance is nearly identical, the main differences are storage limits and operational flexibility:

Storage limits: Each node has a maximum local disk capacity, for example, 2 TB. A cluster with fewer, larger nodes (16 CU) has less total disk capacity than one with more, smaller nodes (8 CU). If your business generates a large volume of data, more nodes typically mean more total storage space.
Scaling granularity: The 8 CU specification offers finer granularity. You can adjust resources in smaller increments, which provides more flexibility and improves cost control.