CU selection guide

更新时间:
复制 MD 格式

Alibaba Cloud Elasticsearch Serverless retrieval-augmented applications (Version 8.17) are billed based on Compute Units (CUs). One CU provides performance equivalent to one vCPU core and 4 GB of memory. Unlike self-managed servers, which may have low utilization due to inefficient resource allocation, CUs are designed for 100% utilization to prevent resource waste. This topic helps you estimate your CU usage and set a reasonable CU quota as needed.

Usage notes

Before you select CUs for a retrieval-augmented application (Version 8.17), you must understand the following CU types.

CU type (by purpose)

Description

Quota (Unit: CU/s)

Fixed CU

A pre-configured baseline of CU resources to handle regular workloads and ensure stable service traffic processing.

You can select 2 CU/s, 4 CU/s, 6 CU/s, 8 CU/s, 16 CU/s, or 24 CU/s. The default value is 2 CU/s. Read and write CU resources are allocated at a 1:1 ratio.

Elastic CU

Used to handle traffic bursts, such as flash sales or log peaks. When CU usage exceeds the fixed CU quota, the system automatically schedules elastic resources.

Note

You can enable this feature as needed. If this feature is disabled and CU usage exceeds the fixed CU quota, the system rejects the excess requests and returns a 429 error.

The maximum total CU quota that you can scale to varies based on the fixed CU quota. For more information, see CU allocation rules and limits. By default, elastic read and write CU resources are also allocated at a 1:1 ratio.

Read CU

The amount of CUs consumed by query operations, such as search, aggregation, and data retrieval.

Write CU

The amount of CUs consumed by write operations, such as data indexing, updates, and deletions.

Total CU

The total amount of CUs consumed by the system. This is the sum of read CUs and write CUs.

CU allocation rules and limits

Retrieval-augmented applications (Version 8.17) use a read/write splitting architecture. By default, read and write CUs are allocated at a 1:1 ratio. The following tables show the relationships and limits among fixed CUs, total CUs, read CUs, and write CUs. You can use this information to estimate the required fixed CU quota and the minimum total CU specification based on the maximum usage of your read and write CUs.

  • Elastic computing disabled

    Note

    If elastic computing is disabled and CU usage exceeds the fixed CU quota, the system rejects the excess requests and returns a 429 error. To ensure service stability and uninterrupted access during traffic spikes, enable the elastic computing feature.

    Fixed CU quota (Unit: CU/s)

    Total CU limit (Unit: CU/s)

    Read CU limit (Unit: CU/s)

    Write CU limit (Unit: CU/s)

    2

    2

    1

    1

    4

    4

    2

    2

    6

    6

    3

    3

    8

    8

    4

    4

    16

    16

    8

    8

    24

    24

    12

    12

  • Elastic computing enabled

    Note

    In practice, the elastic CU limit may fluctuate slightly due to resource scheduling and load variations.

    Fixed CU quota (Unit: CU/s)

    Elastic CU limit (maximum CUs you can scale to, Unit: CU/s)

    Read CU limit (Unit: CU/s)

    Write CU limit (Unit: CU/s)

    2

    12

    6

    6

    4

    24

    12

    12

    6

    24

    12

    12

    8

    32

    16

    16

    16

    48

    24

    24

    24

    72

    36

    36

Prerequisites

You have created a retrieval-augmented application (Version 8.17).

Selection guide

Refer to CU allocation rules and limits and follow the steps in this section to estimate the required number of CUs. If the current CU configuration does not meet your needs, you can modify the CU quota.

Step 1: Estimate the minimum total CU specification based on read and write CU limits

In Elasticsearch, query and write operations are the main system workloads. These operations typically consume a large amount of compute resources.

  • Read CU (query compute resources): Processes query operations, including full-text search and aggregate queries. These operations require sufficient CPU and memory resources to handle concurrent queries.

  • Write CU (write compute resources): Processes write operations, including batch data writes, index creation, and updates. These operations have high requirements for I/O and compute resources (CPU and memory).

The peak usage of read CUs and write CUs directly reflects the system's demand for compute resources during normal and peak periods. Therefore, you can estimate the minimum CU specification required for stable operation under high load based on the historical peak usage of read and write CUs for your services.

Step 2: Estimate the fixed CU quota based on sub-peak and daily elastic CU usage

In retrieval-augmented applications (Version 8.17), fixed CUs are less expensive than elastic CUs. You are charged for elastic CUs only when your actual CU usage exceeds the fixed CU quota. To reduce costs, configure a reasonable fixed CU quota based on your service's sub-peak traffic, daily elastic CU usage, and the minimum CU specification estimated in Step 1.

  • Sub-peak traffic: Basing your quota on sub-peak traffic ensures stable system performance most of the time. This approach lets you use elastic resources to handle extreme but infrequent load conditions (absolute peaks), which reduces unnecessary resource waste.

  • Daily elastic CU usage: Analyzing daily elastic CU usage helps you understand temporary resource requirements during peak hours. Setting a reasonable fixed CU quota can reduce the reliance on elastic resources and lower costs.

Note

Elastic resources are more expensive. It is more cost-effective to use them only when necessary and for short periods, such as 2 to 4 hours.

Step 3: Adjust the fixed CU quota based on actual CU usage

After you use a retrieval-augmented application (Version 8.17) for a period of time, you can view its CU usage. Then, you can adjust the Fixed CU quota based on the actual usage of read and write CUs.

  • Scenario 1: After you enable elastic computing, if the read CU or write CU usage consistently exceeds 1.5 times the fixed CU quota, this indicates that the current system load is greater than the capacity of the original configuration. To avoid performance issues, you must increase the fixed CU quota.

    Note

    If an application's requests exceed the total CU limit, the system rejects the excess requests and returns a 429 error. You can set up the monitoring and alerting feature based on the actual usage of read and write CUs to be notified of such events.

    • Example 1: If the peak read CU usage reaches 12 CU/s, the fixed CU quota must be at least 8 CU/s to ensure the application can handle your access traffic.

    • Example 2: If the fixed CU quota is 2 CU/s and the peak read CU usage is 6 CU/s (which is more than 1.5 times the fixed CU quota) during a specific period, you must increase the fixed CU quota to at least 4 CU/s (`6 / 1.5`) to ensure that query operations can be executed normally.

  • Scenario 2: For scenarios that involve large data volumes, you may need to increase the fixed CU quota.

    By default, for every 1 CU/s increase in the fixed CU quota, the supported data storage capacity increases by 40 GB. If your data volume is large, you can increase the fixed CU quota to increase your storage quota. You can also adjust the maximum storage ratio per CU as needed.

Selection example

This example features a self-managed search service that uses a retrieval-augmented application (Version 8.17). The service has a total of 20 cores of compute resources for operations such as index building and query processing. The following table shows the CPU utilization and the actual CUs required during different time periods.

Time period

Duration (hours)

CPU utilization (%)

Actual CUs required (peak)

Peak hours

7

50%

20 cores × 50% = 10 CUs

Sub-peak hours

8

30%

20 cores × 30% = 6 CUs

Off-peak hours

9

<20%

20 cores × 20% = 4 CUs

Follow these steps to determine the optimal CU configuration:

  1. Calculate the minimum fixed CU quota.

    • Minimum total CU specification: The compute resources required during peak hours are 20 cores × 50% = 10 CUs.

    • Minimum fixed CU quota: 10 / 3 ≈ 3.3. Round up to 4 CUs.

      Note

      To handle traffic bursts, retrieval-augmented applications (Version 8.17) provide elastic CUs. The maximum number of elastic CUs is twice the number of fixed CUs, and the maximum total CUs is three times the number of fixed CUs.

  2. Plan elastic resources based on sub-peak hours.

    CUs required during peak hours: 20 cores × 50% = 10 CUs. CUs required during sub-peak hours: 20 cores × 30% = 6 CUs.

    • If the fixed CU quota is 4, you need 10 - 4 = 6 elastic CUs during peak hours and 6 - 4 = 2 elastic CUs during sub-peak hours.

    • If the fixed CU quota is 6, you need 10 - 6 = 4 elastic CUs during peak hours and no elastic CUs during sub-peak hours.

    • If the fixed CU quota is 8, you need 10 - 8 = 2 elastic CUs during peak hours and no elastic CUs during sub-peak hours.

  3. Compare costs.

    When the fixed CU quota is greater than 2, the unit price of a fixed CU is CNY 0.2600/CU/hour, and the unit price of an elastic CU is CNY 0.4450/CU/hour. The following tables show the costs for different fixed CU quotas.

    • Plan 1: Fixed CU quota of 4 CUs

      Time period

      Fixed CU cost (CNY)

      Elastic CU cost (CNY)

      Total cost (CNY)

      Peak hours

      4 × 0.2600 × 7 = 7.28

      6 × 0.4450 × 7 = 18.69

      7.28 + 18.69 = 25.97

      Sub-peak hours

      4 × 0.2600 × 8 = 8.32

      2 × 0.4450 × 8 = 7.12

      8.32 + 7.12 = 15.44

      At all other times

      4 × 0.2600 × 9 = 9.36

      No elastic resource cost

      9.36

      Total daily cost: 25.97 + 15.44 + 9.36 = CNY 50.77

    • Plan 2: Fixed CU quota of 6 CUs

      Time period

      Fixed CU cost (CNY)

      Elastic CU cost (CNY)

      Total cost (CNY)

      Peak hours

      6 × 0.2600 × 7 = 10.92

      4 × 0.4450 × 7 = 12.46

      10.92 + 12.46 = 23.38

      Sub-peak hours

      6 × 0.2600 × 8 = 12.48

      No elastic resource cost

      12.48

      At other times

      6 × 0.2600 × 9 = 14.04

      No elastic resource cost

      14.04

      Total daily cost: 23.38 + 12.48 + 14.04 = CNY 49.9

    • Plan 3: Fixed CU quota of 8 CUs

      Time period

      Fixed CU cost (CNY)

      Elastic CU cost (CNY)

      Total cost (CNY)

      Peak hours

      8 × 0.2600 × 7 = 14.56

      2 × 0.4450 × 7 = 6.23

      14.56 + 6.23 = 20.79

      Sub-peak hours

      8 × 0.2600 × 8 = 16.64

      No elastic resource cost

      16.64

      Other Times

      8 × 0.2600 × 9 = 18.72

      No elastic resource cost

      18.72

      Total daily cost: 20.79 + 16.64 + 18.72 = CNY 56.15

Conclusion:

Fixed CU quota

Elastic resource usage duration

Total cost (per day)

Conclusion

4 CUs

15 hours

CNY 50.77

A fixed CU quota of 6 CUs provides the lowest cost.

6 CUs

7 hours

CNY 49.9

8 CU

7 hours

CNY 56.15

Go to the application details page

  1. Go to the Application Management page and select the destination region.

  2. Click the name of your application to go to its details page.

    From the application details page, you can view the CU usage of the application and modify the CU quota as needed.

View CU usage

Follow these steps to view the CU usage of your application within a specified time period.

  1. Go to the application details page.

  2. In the navigation pane on the left, click Application Management > Monitoring Center to go to the Monitoring Center page.

    On this page, you can view the usage details of read CUs (query compute resources), write CUs (write compute resources), and total CUs (query compute resources + write compute resources). For more information about how to view monitoring metrics, see Application monitoring and log query.image

    • Item 1: You can view the total query compute resources and write compute resources used by all indexes in the application for the current day. You can also see the day-over-day percentage change, which is the increase or decrease compared to the previous day's data.

    • Item 2: You can view the usage details of CU-related metrics for a specified time period, which are described in the following table.

      Note
      • The default data granularity is 1 minute, which means the time interval between two metric points in the graph is 1 minute.

      • The time interval between data points varies based on the selected time range. The actual interval is displayed on the UI.

      Metric

      Description

      Diagram

      Query Compute Resource

      Each metric point represents the average number of read CUs consumed per second within the current time interval.

      Write Compute Resource

      Each metric point represents the average number of write CUs consumed per second within the current time interval.

      Minimum Compute Resource for Storage

      The minimum number of CUs required to ensure data storage, access, and maintenance.

      Note
      • To ensure efficiency, the system dynamically adjusts the number of CUs required for storage based on current storage usage. By default, each CU can support a maximum of 40 GB of data storage. You can also use the max_storage_per_cu configuration item to adjust the maximum storage capacity per CU.

      • The value for the minimum compute resources required for storage cannot exceed the Fixed CU Quota.

      image

      CU Usage

      The total consumption of read CUs and write CUs at a specific point in time.

      Note

      The blue line represents the number of fixed CUs, and the green line represents the total number of CUs actually consumed.

      • Regardless of whether elastic computing is enabled, you are charged based on the billing standard for fixed CUs when the total CU usage does not exceed the fixed CU amount.

      • If elastic computing is enabled and the total CU usage exceeds the fixed CU amount, the portion within the limit is charged based on the fixed CU billing standard, and the excess portion is charged based on the elastic CU billing standard.

      For more information about billing, see Billing details.

Modify the CU quota

If the current CU configuration does not meet your business requirements, you can adjust the relevant quotas.

Modify the fixed CU quota

Go to the application details page. In the Billing Quota section, click Modify to adjust the fixed CU quota.

Modify the storage limit per CU

You can modify the max_storage_per_cu quota to adjust the maximum storage capacity supported by each CU. For more information about how to modify quotas, see Adjust quotas.

  1. Go to the application details page.

  2. In the navigation pane on the left, click Application Management > Service Quotas to go to the Service Quotas page.

    On the Quota Overview tab, search for max_storage_per_cu and click Modify Quota. After you specify the new quota value, click Submit Modification and follow the on-screen instructions to submit a modification request.

    Note

    This quota can be modified only by submitting a request. Make sure to request resources based on your actual business requirements. After you submit the request, you can view its details on the Service Quotas > Request History tab.