Instance type planning and recommendations

更新时间:
复制 MD 格式

This document provides planning and recommendations for StarRocks instance types. It describes the compute-storage integrated and compute-storage separated instance types.

Compute-storage integrated edition

In the compute-storage integrated edition, an instance contains only frontend (FE) and backend (BE) nodes. This section provides recommendations for the specifications of both node types.

Estimate the number of CUs for BE nodes

In the compute-storage integrated edition, backend (BE) nodes are responsible for data storage and computation tasks.

  • Estimation formula

    Total CUs = Total rows to scan / CPU processing capability / Expected response time * QPS (Queries Per Second)

    The parameters are described as follows:

    • Total rows to scan: The expected number of rows that each SQL query scans. This is not the total number of rows in a single table, but only the number of rows that need to be scanned.

    • CPU processing capability: This value changes dynamically based on the complexity of the SQL query. It typically ranges from 10 million to 100 million rows per second. The more complex the SQL query is, the fewer rows are processed.

    • Expected response time: The expected running time of an SQL query. For example, the query should return a result within 1 second.

    • Queries Per Second (QPS): The number of concurrent SQL queries submitted per second. For example, 30 queries per second.

  • Sample data

    Important

    The formula provides an estimate that may not be completely accurate because performance varies with SQL complexity. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

    Total rows to scan

    SQL complexity

    Estimated CPU processing capability (rows/s)

    Expected response time (s)

    QPS (queries/s)

    Estimated total CUs

    Estimated BE specifications

    50 million rows

    High

    20 million rows

    2

    50

    63

    16 CUs × 4 nodes

    50 million rows

    Medium

    50 million rows

    1.5

    100

    67

    16 CUs × 5 nodes

    50 million rows

    Low

    100 million rows

    1

    200

    100

    32 CUs × 3 nodes

    1 billion rows

    High

    20 million rows

    5

    20

    200

    32 CUs × 7 nodes

    1 billion rows

    Medium

    50 million rows

    3

    50

    333

    64 CUs × 6 nodes

    1 billion rows

    Low

    100 million rows

    1

    80

    800

    64 CUs × 13 nodes

    30 billion rows

    High

    20 million rows

    30

    10

    500

    64 CUs × 8 nodes

    30 billion rows

    Medium

    50 million rows

    15

    20

    800

    64 CUs × 13 nodes

    30 billion rows

    Low

    100 million rows

    15

    20

    400

    64 CUs × 6 nodes

    300 billion rows

    High

    20 million rows

    60

    5

    2083

    64 CUs × 33 nodes

    300 billion rows

    Center

    50 million rows

    45

    10

    2222

    64 CUs × 35 nodes

    300 billion rows

    Low

    100 million rows

    45

    10

    1111

    64 CUs × 18 nodes

Estimate BE node storage

The total storage space required for a StarRocks instance depends on the raw data size, the number of replicas, and the compression ratio of the compression algorithm used.

  • Estimation formula

    Total storage space required = Raw data size * Number of replicas / Compression ratio

    The parameters are described as follows:

    • Raw data size: Size of a single row × Total number of rows.

    • Number of replicas: In a compute-storage integrated architecture, this is typically set to 3.

    • Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.

  • Sample data

    Size of a single row (KB)

    Number of row records

    Number of replicas

    Compression ratio

    Estimated data size (GB)

    50

    100,000,000

    3

    3

    4,768.37

    Note

    The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

BE node disk planning

The estimation formula is as follows.

Total disk size per BE node = Total storage space / Disk utilization / Number of BE nodes

The parameters are described as follows:

  • Total storage space: The total storage space calculated for the BE nodes.

  • Disk utilization: A utilization rate of 80% is recommended to reserve the remaining 20% of the space for computation.

  • Number of BE nodes: The number of BE nodes determined from the CU estimation.

For example, if the total storage space is 4768 GB, disk utilization is 80%, and there are 11 BE nodes, the calculation is 4768 GB / 80% / 11 = 541 GB. Therefore, the total disk size for a single BE node is 541 GB.

Disk quantity selection

The number of disks to select depends on the performance of enterprise SSDs (ESSDs) and the total disk capacity of a single node. To optimize single-disk performance, you can split ESSD PL1 disks as shown in the following table.

Total disk size per BE node

Disk type

Recommended number of disks

<= 500 GB

ESSD PL1

1

500 GB to 1 TB

ESSD PL1

1 to 2

1 TB to 1.5 TB

ESSD PL1

2 to 3

1.5 TB to 2 TB

ESSD PL1

3 to 4

2 TB to 2.5 TB

ESSD PL1

4 to 5

2.5 TB to 3 TB

ESSD PL1

5 to 6

3 TB to 3.5 TB

ESSD PL1

6 to 7

3.5 TB to 4 TB

ESSD PL1

7 to 8

> 4 TB

ESSD PL1

8 blocks

The performance limits for other ESSD cloud disks are as follows:

  • ESSD PL0: Reaches the maximum disk IOPS at 320 GB.

  • ESSD PL1: Reaches the maximum disk IOPS at 460 GB.

  • ESSD PL2: Reaches the maximum disk IOPS at 1260 GB.

  • ESSD PL3: Reaches the maximum disk IOPS at 7760 GB.

To optimize performance, refer to the disk splitting recommendations for ESSD PL1 and adjust the number of disks for other ESSD types accordingly.

Estimate FE node specifications

Frontend (FE) nodes are mainly responsible for metadata management, client connection management, query planning, and query scheduling.

You can roughly estimate the FE specifications based on the total number of CUs for BE nodes. The following table provides specific recommendations. The data disk for an FE node typically requires only 100 GB. If the storage space becomes insufficient, you can scale it out separately.

Total BE CUs

Scenario type

Recommended FE specifications

< 120 CUs

Normal scenario

8 CUs × 3

120 CUs to 1000 CUs

Normal scenario

16 CUs × 3

1000 CUs to 3000 CUs

Normal scenario

32 CUs × 3

>= 3000 CUs

Normal scenario

64 CUs × 3

Note
  • The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

  • For high-concurrency point query scenarios, consider increasing the number of frontend nodes. For example, you can increase the number to five.

Compute-storage separated edition

In the compute-storage separated edition, an instance contains only FE and compute nodes (CNs).

Estimate the number of CUs for CNs

For more information, see Estimate the number of CUs for BE nodes.

Estimate CN storage

The storage for CNs is mainly used for cached data.

  • Estimation formula

    Total storage space required = Raw data size / Compression ratio * Hot data percentage

    The parameters are as follows:

    • Raw data size: Size of a single row × Total number of rows.

    • Compression ratio: StarRocks supports four compression algorithms: zlib, Zstandard (or zstd), LZ4, and Snappy, listed in descending order of compression ratio. These algorithms provide compression ratios from 3:1 to 5:1.

    • Hot data percentage: Estimate the percentage of frequently queried data (hot data) based on your business needs. For example, you can set this value to 50%. If you are unsure about the specific percentage but want to ensure sufficient query performance for the compute-storage separated instance, set this value to 100%. This is equivalent to the size of one replica. Because the primary key index also uses cache disk space, a 20% buffer is recommended. Therefore, the recommended setting is 120%.

  • Sample data

    Size of a single row (KB)

    Number of row records

    Compression ratio

    Hot data percentage

    Estimated data size (GB)

    50

    100,000,000

    3

    120%

    1,907.35

    Note

    The values in the table are only recommendations. In a production environment, you must evaluate the final required resources based on stress test results for your specific business.

For information about the disk size and quantity for a single CN, see BE node disk planning.

Estimate FE node specifications

For more information, see Estimate FE node specifications.