Create a spot instance

更新时间:
复制 MD 格式

ECI supports spot instances. You can use spot instances for short-lived Jobs and certain stateless applications with high scalability and fault tolerance to reduce instance costs. This topic describes how to create a spot ECI pod in a Kubernetes cluster.

Background information

A preemptible instance is a low-cost, bid-based instance. You can bid on idle Alibaba Cloud compute resources to run your containers. The instance runs until your bid is lower than the current market price or the resource inventory is insufficient, which triggers resource reclamation.

Preemptible instances are ideal for short-running jobs and stateless applications that require high scalability and fault tolerance, such as elastically scalable web services, image rendering, big data analytics, and large-scale parallel computing. The more distributed, scalable, and fault-tolerant your application is, the more you can benefit from using preemptible instances to save costs and increase throughput. For more information, see What is a preemptible instance?.

Key concepts

Before you create a preemptible instance, understand the following:

  • Billing

    The market price of a preemptible instance fluctuates with supply and demand. When you create a preemptible instance, you must specify a bidding mode. If the real-time market price for the specified instance type is lower than your bid and there is sufficient inventory, the instance is created successfully. After creation, you are billed at the market price at the time of creation during the protection period (1 hour by default). After this period, you are billed at the real-time market price.

    Note

    Preemptible instances are offered at a discount compared to pay-as-you-go instances. The actual price fluctuates with supply and demand, and you are charged for the actual usage duration. For more information, see Preemptible instance billing.

  • Reclamation mechanism

    After the protection period ends, the system automatically checks the market price and inventory of the instance type every 5 minutes. If the market price at any point exceeds your bid or the inventory for the instance type is insufficient, the system releases the preemptible instance.

    Note
    • About 3 minutes before resource reclamation, the system generates a release event.

    • After resource reclamation, the instance is no longer billed, but its information is retained, and its status changes to Expired.

Usage notes

When you use preemptible instances, consider the following:

  • Select a suitable instance type and a reasonable bid.

    You can use ECS OpenAPI operations to query information about preemptible instances over the last 30 days to help you select an instance type and bid. The relevant API operations are:

    Important

    Your bid should be high enough to account for market price fluctuations and align with your business expectations. This increases the chances of successfully creating a preemptible instance and helps prevent it from being released due to price changes, allowing you to meet business needs while saving costs.

  • Save important data on storage media that are not affected by instance releases, such as a Cloud Disk with the release-with-instance option disabled, or File Storage NAS.

Creation methods

You can create a preemptible ECI instance by specifying an ECS instance type or by specifying vCPUs and memory:

  • Specify an ECS instance type

    Billing is based on the pay-as-you-go market price and real-time discount for the specified instance type.

  • Specify vCPU and memory

    This method is equivalent to specifying an ECS instance type. The system automatically matches an ECS instance type that meets the specified resource and price requirements. The market price of the matched instance type serves as the billing base price. This means the discount is applied to the market price of the matched ECS instance type, not the standard pay-as-you-go price for an ECI instance with equivalent vCPU and memory.

    This method supports only instance types with 2 or more vCPUs. The following table lists the supported vCPU and memory specifications. If you specify an unsupported combination, the system automatically rounds it up to the next supported specification.

    vCPU

    Memory (GiB)

    2

    2, 4, 8, 16

    4

    4, 8, 16, 32

    8

    8, 16, 32, 64

    12

    12, 24, 48, 96

    16

    16, 32, 64, 128

    24

    24, 48, 96, 192

    32

    32, 64, 128, 256

    52

    96, 192, 384

    64

    128, 256, 512

Configuration

You can create a spot instance by adding annotations to the pod metadata. The following table describes the relevant annotations.

Annotation

Example value

Required

Description

k8s.aliyun.com/eci-spot-strategy

SpotAsPriceGo

Yes

The bidding strategy for the spot instance. Valid values:

  • SpotWithPriceLimit: Sets a maximum hourly price for the spot instance. If you use this strategy, you must also specify the price limit by using the k8s.aliyun.com/eci-spot-price-limit annotation.

  • SpotAsPriceGo: The system automatically bids at the current market price.

    Important

    If you use the SpotAsPriceGo policy and the resources for the specified instance type are in high demand in the corresponding zone, the bid price may reach the pay-as-you-go price.

k8s.aliyun.com/eci-spot-price-limit

"0.5"

No

The maximum hourly price for the spot instance. You can specify a value with up to three decimal places.

This annotation is valid only when k8s.aliyun.com/eci-spot-strategy is set to SpotWithPriceLimit.

k8s.aliyun.com/eci-spot-duration

"0"

No

The protection period for the spot instance, in hours. The default value is 1. A value of 0 means no protection period.

k8s.aliyun.com/eci-spot-fallback

"true"

No

Specifies whether to create a pay-as-you-go instance if a spot instance cannot be created due to insufficient inventory. The default value is false.

Important
  • Add annotations under the pod's metadata. For example, when you create a Job, add the annotation under spec>template>metadata.

  • Elastic Container Instance-related annotations are only applied when a pod is created. Adding or modifying these annotations on an existing pod will have no effect.

Example 1: Specify an ECS instance type and use SpotWithPriceLimit

apiVersion: batch/v1
kind: Job
metadata:
  name: test
spec:
  template:
    metadata:
      labels:
        app: perl
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs : "ecs.c6.large"           # Specify an ECS instance type.
        k8s.aliyun.com/eci-spot-strategy: "SpotWithPriceLimit"  # Use a bidding strategy with a custom price limit.
        k8s.aliyun.com/eci-spot-price-limit: "0.25"            # Set the maximum hourly price.
    spec:
      containers:
      - name: pi
        image: registry.cn-shanghai.aliyuncs.com/eci_open/perl:5
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

The preceding YAML example creates a spot instance of the ecs.c6.large instance type.

  • At the time of creation, if no inventory meets the instance type and price limit requirements, creation fails.

  • After creation, the instance has a 1-hour protection period. After the 1-hour protection period expires, the spot instance is reclaimed if the market price exceeds your bid or the inventory for the instance type is insufficient.

Example 2: Specify vCPU and memory and use SpotAsPriceGo

  • Specify vCPU and memory by using pod.spec.resources

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: test
    spec:
      template:
        metadata:
          labels:
            app: perl
            alibabacloud.com/eci: "true" 
          annotations:
            k8s.aliyun.com/eci-spot-strategy: "SpotAsPriceGo"  # Use the system's automatic bidding, which follows the real-time market price.
        spec:
          containers:
          - name: pi
            image: registry.cn-shanghai.aliyuncs.com/eci_open/perl:5
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
            resources:
              limits:              # Specify 2 vCPUs and 4 GiB of memory for the pi container.
                cpu: 2000m
                memory: 4096Mi
          restartPolicy: Never
  • Specify vCPU and memory by using an annotation

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: test
    spec:
      template:
        metadata:
          labels:
            app: perl
            alibabacloud.com/eci: "true" 
          annotations:
            k8s.aliyun.com/eci-use-specs : "2-4Gi"             # Specify vCPU and memory. Instances with 2 vCPUs or more are supported.
            k8s.aliyun.com/eci-spot-strategy: "SpotAsPriceGo"  # Use the system's automatic bidding, which follows the real-time market price.
        spec:
          containers:
          - name: pi
            image: registry.cn-shanghai.aliyuncs.com/eci_open/perl:5
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
          restartPolicy: Never

The preceding YAML examples create a spot instance with 2 vCPUs and 4 GiB of memory.

  • At the time of creation, if no inventory meets the resource requirements, creation fails.

  • After creation, the instance has a 1-hour protection period. After the 1-hour protection period expires, the spot instance is reclaimed if the market price exceeds your bid or the inventory for the instance type is insufficient.

Example 3: Set no protection period

apiVersion: batch/v1
kind: Job
metadata:
  name: test
spec:
  template:
    metadata:
      labels:
        app: perl
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs : "2-4Gi"             # Specify vCPU and memory. Instances with 2 vCPUs or more are supported.
        k8s.aliyun.com/eci-spot-strategy: "SpotAsPriceGo"  # Use the system's automatic bidding, which follows the real-time market price.
        k8s.aliyun.com/eci-spot-duration: "0"              # Set no protection period.
    spec:
      containers:
      - name: pi
        image: registry.cn-shanghai.aliyuncs.com/eci_open/perl:5
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

The preceding YAML example creates a spot instance with 2 vCPUs and 4 GiB of memory.

  • At the time of creation, if no inventory meets the resource requirements, creation fails.

  • After creation, there is no protection period. The spot instance is reclaimed as soon as the market price exceeds the bid or the inventory for the instance type is insufficient.

Example 4: Fall back to a pay-as-you-go instance

apiVersion: batch/v1
kind: Job
metadata:
  name: test
spec:
  template:
    metadata:
      labels:
        app: perl
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs : "ecs.c6.large"           # Specify an ECS instance type.
        k8s.aliyun.com/eci-spot-strategy: "SpotWithPriceLimit"  # Use a bidding strategy with a custom price limit.
        k8s.aliyun.com/eci-spot-price-limit: "0.05"            # Set the maximum hourly price.
        k8s.aliyun.com/eci-spot-fallback: "true"                # Automatically convert to a pay-as-you-go instance if spot inventory is unavailable.
    spec:
      containers:
      - name: pi
        image: registry.cn-shanghai.aliyuncs.com/eci_open/perl:5
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

The preceding YAML example creates a spot instance of the ecs.c6.large instance type.

  • At the time of creation, if inventory that meets the instance type and price limit requirements is available, a spot instance is created. After creation, the instance has a 1-hour protection period. After the 1-hour protection period expires, the spot instance is reclaimed if the market price exceeds your bid or the inventory for the instance type is insufficient.

  • At the time of creation, if no inventory meets the instance type and price limit requirements, a pay-as-you-go instance is created. After creation, the system does not automatically reclaim the instance.

    After the instance is created, you can run the kubectl describe pod command to check the pod's events and confirm whether it has fallen back to a pay-as-you-go instance. A SpotDegraded event indicates that the instance has fallen back to a pay-as-you-go instance.

    Events:
      Type     Reason                  Age   From               Message
      ----     ------                  ---   ----               -------
      Warning  MissingClusterDNS       32m   virtual-kubelet    pod: "default/test4-dbrw6(3ef7c65d-3908-40e3-ab1f-39ff9562a36b)". virtual-kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
      Warning  SpotDegraded            32m   EciService         [eci.containergroup]Spot[SpotStrategy:SpotWithPriceLimit,SpotPriceLimit:0.001,SpotDuration:1] will be degraded because the specified instance is out of stock
      Normal   SuccessfulHitImageCache 32m   EciService         [eci.imagecache]Successfully hit image cache imc-2ze7udxttnd(xxx), eci will be scheduled with this image cache.
      Normal   Pulled                  32m   kubelet            Container image "registry-vpc.cn-beijing.aliyuncs.com/eci_open/perl:5" already present on machine
      Normal   Created                 32m   kubelet            Created container pi
      Normal   Started                 32m   kubelet            Started container pi

Reclamation details

After a spot instance is created, it runs normally during its protection period. After the protection period expires, the spot instance is reclaimed if the market price exceeds your bid or if resource inventory is insufficient. This section describes the events and pod statuses related to spot instance reclamation.

  • Pre-release event

    Approximately three minutes before a spot instance is reclaimed, a SpotToBeReleased event is generated.

    Important

    ECI notifies you through Kubernetes Events that the spot instance will be released. During this time, you can take action to prevent business disruption from the instance reclamation. For more information, see Graceful termination.

    • Run the kubectl describe command to view detailed information about the pod. You can see the pre-release event in the Events section of the output. The following is an example:

      Events:
        Type     Reason            Age    From          Message
        ----     ------            ----   ----          -------
        Warning  SpotToBeReleased  3m32s  kubelet, eci  Spot ECI will be released in 3 minutes
    • Run the kubectl get events command to view event information. You can see the pre-release event in the output. The following is an example:

      LAST SEEN   TYPE      REASON             OBJECT         MESSAGE
      3m39s       Warning   SpotToBeReleased   pod/pi-frmr8   Spot ECI will be released in 3 minutes
  • Pod status after reclamation

    After a spot instance is reclaimed, its information is retained, but its status changes to Failed, and the reason is BidFailed.

    • Run the kubectl get pod command to view pod information. You can see that the pod status has changed in the output. The following is an example:

      NAME       READY   STATUS      RESTARTS   AGE
      pi-frmr8   1/1     BidFailed   0          3h5m
    • Run the kubectl describe command to view detailed information about the pod. You can see the pod status information in the output. The following is an example:

      Status:             Failed
      Reason:             BidFailed
      Message:            The pod is spot instance, and have been released at 2020-04-08T12:36Z

Graceful termination

Approximately three minutes before a spot instance is reclaimed, a SpotToBeReleased event is generated, and the ContainerInstanceExpired field in the pod's conditions is set to true. Use these notification mechanisms to implement graceful termination and pod rotation, which minimizes business disruption from spot instance reclamation.

Virtual Node supports graceful termination for ECI spot instances. You can add the k8s.aliyun.com/eci-spot-release-strategy: api-evict annotation to your ECI pod. When the virtual node receives a SpotToBeReleased event, it calls the Eviction API to evict the spot instance.

Important

To support interruption notifications through pod conditions and eviction through the Eviction API, you must upgrade ACK Virtual Node to v2.11.0 or later. For more information, see ACK Virtual Node.

An API-initiated eviction respects your PodDisruptionBudget (PDB) and terminationGracePeriodSeconds configurations. Creating an Eviction object by using the API is similar to performing a policy-controlled DELETE operation on a pod. The process is as follows:

  1. API request

    The virtual node receives a SpotToBeReleased event and calls the Eviction API.

  2. PDB check

    The API server validates the PodDisruptionBudget associated with the target pod.

  3. Eviction execution

    If the API server allows the eviction, the pod is deleted as follows:

    1. The pod resource in the API server is updated with a deletion timestamp, after which the API server considers the pod to be terminating. The pod resource is also marked with the configured grace period.

    2. The kubelet on the node where the pod is running notices that the pod resource is marked for termination and begins to gracefully shut down the local pod.

    3. While the kubelet is shutting down the pod, the control plane removes the pod from Endpoint and EndpointSlice objects. As a result, controllers no longer consider the pod a valid object.

    4. After the pod's grace period expires, the kubelet forcibly terminates the local pod.

    5. The kubelet informs the API server to delete the pod resource.

    6. The API server deletes the pod resource.

  4. Workload reconciliation

    If the target pod is managed by a controller (such as a ReplicaSet, StatefulSet, or a fault-tolerant Job, sparkApplication, or Workflow), the controller typically creates a new pod to replace the evicted one.

Note

If the PodDisruptionBudget is misconfigured, or if there are many pods not in the Ready state when the Eviction API is called, the eviction process may be blocked. If the eviction is not completed before the spot instance expires, the instance is reclaimed immediately.