ECI supports spot instances. You can use spot instances for short-lived Jobs and certain stateless applications with high scalability and fault tolerance to reduce instance costs. This topic describes how to create a spot ECI pod in a Kubernetes cluster.
Background information
A preemptible instance is a low-cost, bid-based instance. You can bid on idle Alibaba Cloud compute resources to run your containers. The instance runs until your bid is lower than the current market price or the resource inventory is insufficient, which triggers resource reclamation.
Preemptible instances are ideal for short-running jobs and stateless applications that require high scalability and fault tolerance, such as elastically scalable web services, image rendering, big data analytics, and large-scale parallel computing. The more distributed, scalable, and fault-tolerant your application is, the more you can benefit from using preemptible instances to save costs and increase throughput. For more information, see What is a preemptible instance?.
Key concepts
Before you create a preemptible instance, understand the following:
-
Billing
The market price of a preemptible instance fluctuates with supply and demand. When you create a preemptible instance, you must specify a bidding mode. If the real-time market price for the specified instance type is lower than your bid and there is sufficient inventory, the instance is created successfully. After creation, you are billed at the market price at the time of creation during the protection period (1 hour by default). After this period, you are billed at the real-time market price.
NotePreemptible instances are offered at a discount compared to pay-as-you-go instances. The actual price fluctuates with supply and demand, and you are charged for the actual usage duration. For more information, see Preemptible instance billing.
-
Reclamation mechanism
After the protection period ends, the system automatically checks the market price and inventory of the instance type every 5 minutes. If the market price at any point exceeds your bid or the inventory for the instance type is insufficient, the system releases the preemptible instance.
Note-
About 3 minutes before resource reclamation, the system generates a release event.
-
After resource reclamation, the instance is no longer billed, but its information is retained, and its status changes to Expired.
-
Usage notes
When you use preemptible instances, consider the following:
-
Select a suitable instance type and a reasonable bid.
You can use ECS OpenAPI operations to query information about preemptible instances over the last 30 days to help you select an instance type and bid. The relevant API operations are:
-
DescribeSpotPriceHistory: Queries historical instance prices.
-
DescribeSpotAdvice: Queries information such as the average reclamation rate and average discount rate for instances.
ImportantYour bid should be high enough to account for market price fluctuations and align with your business expectations. This increases the chances of successfully creating a preemptible instance and helps prevent it from being released due to price changes, allowing you to meet business needs while saving costs.
-
-
Save important data on storage media that are not affected by instance releases, such as a Cloud Disk with the release-with-instance option disabled, or File Storage NAS.
Creation methods
You can create a preemptible ECI instance by specifying an ECS instance type or by specifying vCPUs and memory:
-
Specify an ECS instance type
Billing is based on the pay-as-you-go market price and real-time discount for the specified instance type.
-
Specify vCPU and memory
This method is equivalent to specifying an ECS instance type. The system automatically matches an ECS instance type that meets the specified resource and price requirements. The market price of the matched instance type serves as the billing base price. This means the discount is applied to the market price of the matched ECS instance type, not the standard pay-as-you-go price for an ECI instance with equivalent vCPU and memory.
This method supports only instance types with 2 or more vCPUs. The following table lists the supported vCPU and memory specifications. If you specify an unsupported combination, the system automatically rounds it up to the next supported specification.
vCPU
Memory (GiB)
2
2, 4, 8, 16
4
4, 8, 16, 32
8
8, 16, 32, 64
12
12, 24, 48, 96
16
16, 32, 64, 128
24
24, 48, 96, 192
32
32, 64, 128, 256
52
96, 192, 384
64
128, 256, 512
Configuration
You can create a spot instance by adding annotations to the pod metadata. The following table describes the relevant annotations.
|
Annotation |
Example value |
Required |
Description |
|
k8s.aliyun.com/eci-spot-strategy |
SpotAsPriceGo |
Yes |
The bidding strategy for the spot instance. Valid values:
|
|
k8s.aliyun.com/eci-spot-price-limit |
"0.5" |
No |
The maximum hourly price for the spot instance. You can specify a value with up to three decimal places. This annotation is valid only when |
|
k8s.aliyun.com/eci-spot-duration |
"0" |
No |
The protection period for the spot instance, in hours. The default value is 1. A value of 0 means no protection period. |
|
k8s.aliyun.com/eci-spot-fallback |
"true" |
No |
Specifies whether to create a pay-as-you-go instance if a spot instance cannot be created due to insufficient inventory. The default value is false. |
-
Add annotations under the pod's metadata. For example, when you create a Job, add the annotation under
spec>template>metadata. Elastic Container Instance-related annotations are only applied when a pod is created. Adding or modifying these annotations on an existing pod will have no effect.
Example 1: Specify an ECS instance type and use SpotWithPriceLimit
Example 2: Specify vCPU and memory and use SpotAsPriceGo
Example 3: Set no protection period
Example 4: Fall back to a pay-as-you-go instance
Reclamation details
After a spot instance is created, it runs normally during its protection period. After the protection period expires, the spot instance is reclaimed if the market price exceeds your bid or if resource inventory is insufficient. This section describes the events and pod statuses related to spot instance reclamation.
-
Pre-release event
Approximately three minutes before a spot instance is reclaimed, a
SpotToBeReleasedevent is generated.ImportantECI notifies you through Kubernetes Events that the spot instance will be released. During this time, you can take action to prevent business disruption from the instance reclamation. For more information, see Graceful termination.
-
Run the
kubectl describecommand to view detailed information about the pod. You can see the pre-release event in theEventssection of the output. The following is an example:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning SpotToBeReleased 3m32s kubelet, eci Spot ECI will be released in 3 minutes -
Run the
kubectl get eventscommand to view event information. You can see the pre-release event in the output. The following is an example:LAST SEEN TYPE REASON OBJECT MESSAGE 3m39s Warning SpotToBeReleased pod/pi-frmr8 Spot ECI will be released in 3 minutes
-
-
Pod status after reclamation
After a spot instance is reclaimed, its information is retained, but its status changes to
Failed, and the reason isBidFailed.-
Run the
kubectl get podcommand to view pod information. You can see that the pod status has changed in the output. The following is an example:NAME READY STATUS RESTARTS AGE pi-frmr8 1/1 BidFailed 0 3h5m -
Run the
kubectl describecommand to view detailed information about the pod. You can see the pod status information in the output. The following is an example:Status: Failed Reason: BidFailed Message: The pod is spot instance, and have been released at 2020-04-08T12:36Z
-
Graceful termination
Approximately three minutes before a spot instance is reclaimed, a SpotToBeReleased event is generated, and the ContainerInstanceExpired field in the pod's conditions is set to true. Use these notification mechanisms to implement graceful termination and pod rotation, which minimizes business disruption from spot instance reclamation.
Virtual Node supports graceful termination for ECI spot instances. You can add the k8s.aliyun.com/eci-spot-release-strategy: api-evict annotation to your ECI pod. When the virtual node receives a SpotToBeReleased event, it calls the Eviction API to evict the spot instance.
To support interruption notifications through pod conditions and eviction through the Eviction API, you must upgrade ACK Virtual Node to v2.11.0 or later. For more information, see ACK Virtual Node.
An API-initiated eviction respects your PodDisruptionBudget (PDB) and terminationGracePeriodSeconds configurations. Creating an Eviction object by using the API is similar to performing a policy-controlled DELETE operation on a pod. The process is as follows:
-
API request
The virtual node receives a
SpotToBeReleasedevent and calls the Eviction API. -
PDB check
The API server validates the PodDisruptionBudget associated with the target pod.
-
Eviction execution
If the API server allows the eviction, the pod is deleted as follows:
-
The pod resource in the API server is updated with a deletion timestamp, after which the API server considers the pod to be terminating. The pod resource is also marked with the configured grace period.
-
The kubelet on the node where the pod is running notices that the pod resource is marked for termination and begins to gracefully shut down the local pod.
-
While the kubelet is shutting down the pod, the control plane removes the pod from Endpoint and EndpointSlice objects. As a result, controllers no longer consider the pod a valid object.
-
After the pod's grace period expires, the kubelet forcibly terminates the local pod.
-
The kubelet informs the API server to delete the pod resource.
-
The API server deletes the pod resource.
-
-
Workload reconciliation
If the target pod is managed by a controller (such as a ReplicaSet, StatefulSet, or a fault-tolerant Job, sparkApplication, or Workflow), the controller typically creates a new pod to replace the evicted one.
If the PodDisruptionBudget is misconfigured, or if there are many pods not in the Ready state when the Eviction API is called, the eviction process may be blocked. If the eviction is not completed before the spot instance expires, the instance is reclaimed immediately.