Auto scaling
During peak business hours, such as high-volume data writes during the day, a Milvus cluster may experience high CPU and memory usage on its Data Nodes and Index Nodes. This can lead to slow index building, compaction delays, or even node restarts due to out of memory (OOM) errors. Conversely, during off-peak hours or at night, these nodes have low resource utilization, resulting in wasted computing resources.
Schedule-based Auto Scaling allows you to configure multiple time-based scaling plans for your Data Nodes and Index Nodes. This feature automatically scales nodes in or out at specified times, reducing the need for manual adjustments and helping you optimize resource costs.
Limitations
-
Auto Scaling based on load or metrics, such as CPU utilization or QPS, is not supported.
-
Each instance supports only a single time zone for execution. Cross-time-zone conversion is not performed.
-
Each node type supports a maximum of 10 rules.
-
For high-availability instances across two availability zones, the scaling step size is 2.
-
During a scaling activity, the instance temporarily enters a Scaling Out or Scaling In state. During this period, you cannot make other configuration changes to the instance.
-
Scale-out details: A scale-out allocates and initializes new nodes, typically taking a few minutes. The instance then automatically rebalances data, which can take several more minutes depending on the workload.
-
Scale-in details: A scale-in operation first triggers a Decommission process to ensure data on the nodes being removed is safely migrated to the remaining nodes before the resources are released. The time required for this data migration depends on the cluster load. The default timeout for the Decommission process is 1 hour. If data migration does not complete within this time, the system forcibly completes the scale-in.
Version and specification limits
-
Auto Scaling is available only for Standard Edition instances. It is not supported for Basic Edition instances.
-
The instance engine version must meet the following minimum requirements:
|
Engine version |
Minimum version requirement |
|
2.6 |
All 2.6 versions are supported |
|
2.5 |
2.5.12-0.6.2_3.8.0 or later |
|
2.4 |
2.4.23-0.4.3_3.8.0 or later |
If your instance version is earlier than the minimum requirement, you can upgrade the version and then enable Auto Scaling.
-
Components that support Auto Scaling
The following table lists the components that support Auto Scaling for different Milvus versions:
|
Component |
2.4 |
2.5 |
2.6 |
|
Data Node |
Supported |
Supported |
Supported |
|
Index Node |
Supported |
Supported |
— |
|
Streaming Node |
— |
— |
Not supported |
|
Proxy |
Not supported |
Not supported |
Not supported |
|
Query Node |
Not supported |
Not supported |
Not supported |
|
Metadata service |
Not supported |
Not supported |
Not supported |
Billing
-
Subscription instances: The total instance resources = base resources + elastic resources. The base resources (subscription) are not involved in Auto Scaling. Elastic resources are billed on a pay-as-you-go basis. You must manually enable the pay-as-you-go resource group first.
-
Pay-as-you-go instances: The pay-as-you-go resource group is enabled by default. There is no distinction between base resources and elastic resources. All resources can be considered as resources within the pay-as-you-go resource group.
Prerequisites
-
The instance is in the Running state.
-
The pay-as-you-go resource group is enabled for subscription instances.
-
Your account has the permission to modify instance configurations.
Step 1: Go to the Auto Scaling page
-
Log on to the Alibaba Cloud Milvus console.
-
In the left-side navigation pane, click Milvus Instances.
-
In the top navigation bar, select the target region.
-
In the instance list, click the name of the target instance.
-
On the instance details page, click the Auto Scaling tab.
If the Auto Scaling tab is not displayed, the current instance version does not support schedule-based Auto Scaling.
Step 2: (Optional) Enable the pay-as-you-go resource group
If you use a pay-as-you-go instance, skip this step.
Subscription instances must have the pay-as-you-go resource group enabled before you can configure Auto Scaling rules.
-
On the Auto Scaling page, check the status of the pay-as-you-go resource group.
-
If it is not enabled, follow the on-screen instructions to click Enable PASG Resource Group.
-
After the pay-as-you-go resource group is enabled, you can manually scale it out or in:
-
Scale-out: The maximum number of nodes in the pay-as-you-go resource group is 50. The step size is 2 for high-availability instances across two availability zones. During a scale-out, you cannot perform other operations such as scale-in, upgrade, downgrade, configuration change, or version upgrade. The node specifications of the pay-as-you-go resource group are the same as those of the base resources and cannot be modified.
-
Scale-in: The minimum number of nodes in the pay-as-you-go resource group is 0. The step size is 2 for high-availability instances across two availability zones. During a scale-in, you cannot perform other operations such as scale-out, upgrade, downgrade, configuration change, or version upgrade.
-
Step 3: Create an Auto Scaling rule
-
On the Auto Scaling page, click Create Rule.
-
In the Create Rule dialog box, configure the following parameters.
Parameter
Description
Rule Name
Required. The name must be 1 to 64 characters in length and cannot contain special characters. Rule names must be unique within the same instance.
Execution Interval
Required.
The execution interval. Select Everyday to execute the rule once a day at the specified time.
Select the execution time in 24-hour format, accurate to the minute.
You can also select Use cron expression to customize the execution plan with a standard 5-field cron expression for more flexible scheduling scenarios.
Node Configuration
Configure the target node information when the scheduled scaling is triggered:
-
Node Type: Single-select, required. Select the component type to scale (such as Data Node or Index Node).
-
Node Specifications: The specifications of the elastic nodes.
-
Elastic resource (node count): When the rule takes effect, the pay-as-you-go resource count of the selected node is adjusted to this target value.
-
Subscription instances: The minimum is 0 and the maximum is 50.
-
Pay-as-you-go instances: The minimum is the minimum node count (Data Node: 1, Index Node: 1) and the maximum is 50.
The scaling step size is 2 for high-availability instances across two availability zones.
Note-
If the number of Auto Scaling rules for a single node type has reached the maximum (10), you cannot create more rules. Delete some existing rules before creating new ones.
-
You cannot configure rules for the same node type at the same execution time.
-
-
Click OK to create the rule.
After the rule is created, it is displayed in the Rule List.
Rule configuration examples
Example 1: Scale out during working hours
Automatically scale out Data Nodes at 9:00 AM every day to handle daytime data write peaks:
|
Parameter |
Value |
|
Rule name |
day-scale-out |
|
Execution interval |
Everyday |
|
Execution time |
09:00 |
|
Node type |
Data |
|
Elastic resource count |
2 |
Example 2: Scale in at night
Automatically scale in Data Nodes at 10:00 PM every day to release resources during off-peak hours:
|
Parameter |
Value |
|
Rule name |
night-scale-in |
|
Execution interval |
Everyday |
|
Execution time |
22:00 |
|
Node type |
Data |
|
Elastic resource count |
0 |
Manage Auto Scaling rules
In the Rule List on the Auto Scaling page, you can view, edit, disable, enable, or delete Auto Scaling rules.
View the rule list
The rule list displays the following fields:
-
Rule ID / Rule name
-
Node type (Data Node / Index Node)
-
Target resource count (node count)
-
Execution cycle (everyday, execution time)
-
Actions
Edit an Auto Scaling rule
-
On the Auto Scaling tab, find the target rule.
-
In the Actions column, click Edit.
-
Modify the rule parameters and click OK.
-
The node type cannot be modified. All other fields can be adjusted.
-
If the rule is currently being executed, it cannot be edited.
Delete an Auto Scaling rule
-
On the Auto Scaling tab, find the target rule.
-
In the Actions column, click Delete.
-
In the confirmation dialog box, click OK.
-
After a rule is deleted, it no longer triggers scaling activities. Scaling activities that have already been executed are not affected.
-
If the rule is currently being executed, it cannot be deleted.
Enable or disable an Auto Scaling rule
You can toggle the enabled or disabled state of a rule to control whether it takes effect without deleting it.
-
On the Auto Scaling tab, find the target rule.
-
In the Actions column, click Enable or Disable.
-
After a rule is disabled, it no longer triggers scaling activities. Scaling activities that have already been executed are not affected.
-
If the rule is currently being executed, it cannot be modified.
Change the time zone
-
On the Auto Scaling tab, click Switch Time Zone.
-
Select the desired time zone and click OK.
You can change the time zone only when no rules exist for the instance.
Auto Scaling overview
In the Auto Scaling overview section on the Auto Scaling tab, you can view historical scaling activity records:
|
Field |
Description |
|
Rule name |
The name of the rule that triggered this scaling activity |
|
Component type |
The component involved in this scaling activity |
|
Status |
The execution status of the scaling activity, including: Pending, Executing, Succeeded, Failed, Rejected, and Canceled |
|
Start time |
The start time of the scaling activity |
|
End time |
The end time of the scaling activity |
|
Description |
Detailed description of the scaling activity |
FAQ
Does auto scaling affect running queries and writes?
Auto Scaling is designed as a smooth operation that does not interrupt running query and write pipelines, ensuring business continuity.
Is data safe during a scale-in?
Scale-in operations use a graceful decommission mechanism. The system first safely migrates data from the nodes being removed to the remaining nodes. Node resources are released only after the data migration is complete and no data loss is confirmed. The default timeout for the decommission process is 1 hour. If the migration does not complete within this time, the system forcibly completes the scale-in.
Why does my instance not support auto scaling?
Check the following conditions:
-
The instance must be Standard Edition. Basic Edition does not support Auto Scaling.
-
The engine version must meet the minimum version requirement (2.4 requires >= 2.4.23-0.4.3_3.8.0, 2.5 requires >= 2.5.12-0.6.2_3.8.0, and all 2.6 versions are supported).
-
The instance must be in the Running state.
How are elastically scaled-out nodes billed?
Elastically scaled-out nodes are billed on a pay-as-you-go basis based on actual usage duration. Billing stops after the nodes are released during a scale-in.
How many auto scaling rules can be configured?
Each component supports a maximum of 10 scaling rules.