This document provides planning and recommendations for StarRocks instance types. It describes the compute-storage integrated and compute-storage separated instance types.
Compute-storage integrated edition
In the compute-storage integrated edition, an instance contains only frontend (FE) and backend (BE) nodes. This section provides recommendations for the specifications of both node types.
Estimate the number of CUs for BE nodes
In the compute-storage integrated edition, backend (BE) nodes are responsible for data storage and computation tasks.
Estimation formula
Total CUs = Total rows to scan / CPU processing capability / Expected response time * QPS (Queries Per Second)The parameters are described as follows:
Total rows to scan: The expected number of rows that each SQL query scans. This is not the total number of rows in a single table, but only the number of rows that need to be scanned.
CPU processing capability: This value changes dynamically based on the complexity of the SQL query. It typically ranges from 10 million to 100 million rows per second. The more complex the SQL query is, the fewer rows are processed.
Expected response time: The expected running time of an SQL query. For example, the query should return a result within 1 second.
Queries Per Second (QPS): The number of concurrent SQL queries submitted per second. For example, 30 queries per second.
Sample data
ImportantThe formula provides an estimate that may not be completely accurate because performance varies with SQL complexity. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.
Total rows to scan
SQL complexity
Estimated CPU processing capability (rows/s)
Expected response time (s)
QPS (queries/s)
Estimated total CUs
Estimated BE specifications
50 million rows
High
20 million rows
2
50
63
16 CUs × 4 nodes
50 million rows
Medium
50 million rows
1.5
100
67
16 CUs × 5 nodes
50 million rows
Low
100 million rows
1
200
100
32 CUs × 3 nodes
1 billion rows
High
20 million rows
5
20
200
32 CUs × 7 nodes
1 billion rows
Medium
50 million rows
3
50
333
64 CUs × 6 nodes
1 billion rows
Low
100 million rows
1
80
800
64 CUs × 13 nodes
30 billion rows
High
20 million rows
30
10
500
64 CUs × 8 nodes
30 billion rows
Medium
50 million rows
15
20
800
64 CUs × 13 nodes
30 billion rows
Low
100 million rows
15
20
400
64 CUs × 6 nodes
300 billion rows
High
20 million rows
60
5
2083
64 CUs × 33 nodes
300 billion rows
Center
50 million rows
45
10
2222
64 CUs × 35 nodes
300 billion rows
Low
100 million rows
45
10
1111
64 CUs × 18 nodes
Estimate BE node storage
The total storage space required for a StarRocks instance depends on the raw data size, the number of replicas, and the compression ratio of the compression algorithm used.
Estimation formula
Total storage space required = Raw data size * Number of replicas / Compression ratioThe parameters are described as follows:
Raw data size: Size of a single row × Total number of rows.
Number of replicas: In a compute-storage integrated architecture, this is typically set to 3.
Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.
Sample data
Size of a single row (KB)
Number of row records
Number of replicas
Compression ratio
Estimated data size (GB)
50
100,000,000
3
3
4,768.37
NoteThe values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.
BE node disk planning
The estimation formula is as follows.
Total disk size per BE node = Total storage space / Disk utilization / Number of BE nodesThe parameters are described as follows:
Total storage space: The total storage space calculated for the BE nodes.
Disk utilization: A utilization rate of 80% is recommended to reserve the remaining 20% of the space for computation.
Number of BE nodes: The number of BE nodes determined from the CU estimation.
For example, if the total storage space is 4768 GB, disk utilization is 80%, and there are 11 BE nodes, the calculation is 4768 GB / 80% / 11 = 541 GB. Therefore, the total disk size for a single BE node is 541 GB.
Disk quantity selection
The number of disks to select depends on the performance of enterprise SSDs (ESSDs) and the total disk capacity of a single node. To optimize single-disk performance, you can split ESSD PL1 disks as shown in the following table.
Total disk size per BE node | Disk type | Recommended number of disks |
<= 500 GB | ESSD PL1 | 1 |
500 GB to 1 TB | ESSD PL1 | 1 to 2 |
1 TB to 1.5 TB | ESSD PL1 | 2 to 3 |
1.5 TB to 2 TB | ESSD PL1 | 3 to 4 |
2 TB to 2.5 TB | ESSD PL1 | 4 to 5 |
2.5 TB to 3 TB | ESSD PL1 | 5 to 6 |
3 TB to 3.5 TB | ESSD PL1 | 6 to 7 |
3.5 TB to 4 TB | ESSD PL1 | 7 to 8 |
> 4 TB | ESSD PL1 | 8 blocks |
The performance limits for other ESSD cloud disks are as follows:
ESSD PL0: Reaches the maximum disk IOPS at 320 GB.
ESSD PL1: Reaches the maximum disk IOPS at 460 GB.
ESSD PL2: Reaches the maximum disk IOPS at 1260 GB.
ESSD PL3: Reaches the maximum disk IOPS at 7760 GB.
To optimize performance, refer to the disk splitting recommendations for ESSD PL1 and adjust the number of disks for other ESSD types accordingly.
Estimate FE node specifications
Frontend (FE) nodes are mainly responsible for metadata management, client connection management, query planning, and query scheduling.
You can roughly estimate the FE specifications based on the total number of CUs for BE nodes. The following table provides specific recommendations. The data disk for an FE node typically requires only 100 GB. If the storage space becomes insufficient, you can scale it out separately.
Total BE CUs | Scenario type | Recommended FE specifications |
< 120 CUs | Normal scenario | 8 CUs × 3 |
120 CUs to 1000 CUs | Normal scenario | 16 CUs × 3 |
1000 CUs to 3000 CUs | Normal scenario | 32 CUs × 3 |
>= 3000 CUs | Normal scenario | 64 CUs × 3 |
The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.
For high-concurrency point query scenarios, consider increasing the number of frontend nodes. For example, you can increase the number to five.
Compute-storage separated edition
In the compute-storage separated edition, an instance contains only FE and compute nodes (CNs).
Estimate the number of CUs for CNs
For more information, see Estimate the number of CUs for BE nodes.
Estimate CN storage
The storage for CNs is mainly used for cached data.
Estimation formula
Total storage space required = Raw data size / Compression ratio * Hot data percentageThe parameters are as follows:
Raw data size: Size of a single row × Total number of rows.
Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.
Hot data percentage: Estimate the percentage of frequently queried data (hot data) based on your business needs. For example, you can set this value to 50%. If you are unsure about the specific percentage but want to ensure sufficient query performance for the compute-storage separated instance, set this value to 100%. This is equivalent to the size of one replica. Because the primary key index also uses cache disk space, a 20% buffer is recommended. Therefore, the recommended setting is 120%.
Sample data
Size of a single row (KB)
Number of row records
Compression ratio
Hot data percentage
Estimated data size (GB)
50
100,000,000
3
120%
1,907.35
NoteThe values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.
For information about the disk size and quantity for a single CN, see BE node disk planning.
Estimate FE node specifications
For more information, see Estimate FE node specifications.