Instance type planning and recommendations-E-MapReduce(EMR)-阿里云帮助中心

This document provides planning and recommendations for StarRocks instance types. It describes the compute-storage integrated and compute-storage separated instance types.

Compute-storage integrated edition

In the compute-storage integrated edition, an instance contains only frontend (FE) and backend (BE) nodes. This section provides recommendations for the specifications of both node types.

Estimate the number of CUs for BE nodes

In the compute-storage integrated edition, backend (BE) nodes are responsible for data storage and computation tasks.

Estimation formula
```
Total CUs = Total rows to scan / CPU processing capability / Expected response time * QPS (Queries Per Second)
```
The parameters are described as follows:
- Total rows to scan: The expected number of rows that each SQL query scans. This is not the total number of rows in a single table, but only the number of rows that need to be scanned.
- CPU processing capability: This value changes dynamically based on the complexity of the SQL query. It typically ranges from 10 million to 100 million rows per second. The more complex the SQL query is, the fewer rows are processed.
- Expected response time: The expected running time of an SQL query. For example, the query should return a result within 1 second.
- Queries Per Second (QPS): The number of concurrent SQL queries submitted per second. For example, 30 queries per second.

Sample data

Important

The formula provides an estimate that may not be completely accurate because performance varies with SQL complexity. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

Total rows to scan	SQL complexity	Estimated CPU processing capability (rows/s)	Expected response time (s)	QPS (queries/s)	Estimated total CUs	Estimated BE specifications
50 million rows	High	20 million rows	2	50	63	16 CUs × 4 nodes
50 million rows	Medium	50 million rows	1.5	100	67	16 CUs × 5 nodes
50 million rows	Low	100 million rows	1	200	100	32 CUs × 3 nodes
1 billion rows	High	20 million rows	5	20	200	32 CUs × 7 nodes
1 billion rows	Medium	50 million rows	3	50	333	64 CUs × 6 nodes
1 billion rows	Low	100 million rows	1	80	800	64 CUs × 13 nodes
30 billion rows	High	20 million rows	30	10	500	64 CUs × 8 nodes
30 billion rows	Medium	50 million rows	15	20	800	64 CUs × 13 nodes
30 billion rows	Low	100 million rows	15	20	400	64 CUs × 6 nodes
300 billion rows	High	20 million rows	60	5	2083	64 CUs × 33 nodes
300 billion rows	Center	50 million rows	45	10	2222	64 CUs × 35 nodes
300 billion rows	Low	100 million rows	45	10	1111	64 CUs × 18 nodes

Estimate BE node storage

The total storage space required for a StarRocks instance depends on the raw data size, the number of replicas, and the compression ratio of the compression algorithm used.

Estimation formula
```
Total storage space required = Raw data size * Number of replicas / Compression ratio
```
The parameters are described as follows:
- Raw data size: Size of a single row × Total number of rows.
- Number of replicas: In a compute-storage integrated architecture, this is typically set to 3.
- Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.
Sample data
Size of a single row (KB)
Number of row records
Number of replicas
Compression ratio
Estimated data size (GB)
50
100,000,000
3
3
4,768.37
Note
The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

BE node disk planning

The estimation formula is as follows.

Total disk size per BE node = Total storage space / Disk utilization / Number of BE nodes

The parameters are described as follows:

Total storage space: The total storage space calculated for the BE nodes.
Disk utilization: A utilization rate of 80% is recommended to reserve the remaining 20% of the space for computation.
Number of BE nodes: The number of BE nodes determined from the CU estimation.

For example, if the total storage space is 4768 GB, disk utilization is 80%, and there are 11 BE nodes, the calculation is 4768 GB / 80% / 11 = 541 GB. Therefore, the total disk size for a single BE node is 541 GB.

Disk quantity selection

The number of disks to select depends on the performance of enterprise SSDs (ESSDs) and the total disk capacity of a single node. To optimize single-disk performance, you can split ESSD PL1 disks as shown in the following table.

Total disk size per BE node	Disk type	Recommended number of disks
<= 500 GB	ESSD PL1	1
500 GB to 1 TB	ESSD PL1	1 to 2
1 TB to 1.5 TB	ESSD PL1	2 to 3
1.5 TB to 2 TB	ESSD PL1	3 to 4
2 TB to 2.5 TB	ESSD PL1	4 to 5
2.5 TB to 3 TB	ESSD PL1	5 to 6
3 TB to 3.5 TB	ESSD PL1	6 to 7
3.5 TB to 4 TB	ESSD PL1	7 to 8
> 4 TB	ESSD PL1	8 blocks

The performance limits for other ESSD cloud disks are as follows:

ESSD PL0: Reaches the maximum disk IOPS at 320 GB.
ESSD PL1: Reaches the maximum disk IOPS at 460 GB.
ESSD PL2: Reaches the maximum disk IOPS at 1260 GB.
ESSD PL3: Reaches the maximum disk IOPS at 7760 GB.

To optimize performance, refer to the disk splitting recommendations for ESSD PL1 and adjust the number of disks for other ESSD types accordingly.

Estimate FE node specifications

Frontend (FE) nodes are mainly responsible for metadata management, client connection management, query planning, and query scheduling.

You can roughly estimate the FE specifications based on the total number of CUs for BE nodes. The following table provides specific recommendations. The data disk for an FE node typically requires only 100 GB. If the storage space becomes insufficient, you can scale it out separately.

Total BE CUs	Scenario type	Recommended FE specifications
< 120 CUs	Normal scenario	8 CUs × 3
120 CUs to 1000 CUs	Normal scenario	16 CUs × 3
1000 CUs to 3000 CUs	Normal scenario	32 CUs × 3
>= 3000 CUs	Normal scenario	64 CUs × 3

Note

The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.
For high-concurrency point query scenarios, consider increasing the number of frontend nodes. For example, you can increase the number to five.

Compute-storage separated edition

In the compute-storage separated edition, an instance contains only FE and compute nodes (CNs).

Estimate the number of CUs for CNs

For more information, see Estimate the number of CUs for BE nodes.

Estimate CN storage

The storage for CNs is mainly used for cached data.

Estimation formula
```
Total storage space required = Raw data size / Compression ratio * Hot data percentage
```
The parameters are as follows:
- Raw data size: Size of a single row × Total number of rows.
- Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.
- Hot data percentage: Estimate the percentage of frequently queried data (hot data) based on your business needs. For example, you can set this value to 50%. If you are unsure about the specific percentage but want to ensure sufficient query performance for the compute-storage separated instance, set this value to 100%. This is equivalent to the size of one replica. Because the primary key index also uses cache disk space, a 20% buffer is recommended. Therefore, the recommended setting is 120%.
Sample data
Size of a single row (KB)
Number of row records
Compression ratio
Hot data percentage
Estimated data size (GB)
50
100,000,000
3
120%
1,907.35
Note
The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

For information about the disk size and quantity for a single CN, see BE node disk planning.

Estimate FE node specifications

For more information, see Estimate FE node specifications.

Size of a single row (KB)	Number of row records	Number of replicas	Compression ratio	Estimated data size (GB)
50	100,000,000	3	3	4,768.37