GPU pod capacity reservation

更新时间:
复制 MD 格式

Pod-based capacity reservation provides resource certainty for elastic workloads. A GPU pod capacity reservation does not need to be bound directly to a cluster. You only need to specify attributes such as pod specifications, an availability zone, and a reservation duration at the time of purchase. This guarantees that you can launch pods with matching specifications on demand within minutes. GPU pod capacity reservation provides resource certainty at a lower price than pay-as-you-go pods. This topic describes the features of GPU pod capacity reservation.

Features

  • Resource certainty: During the effective period of a GPU pod capacity reservation, the system guarantees that resources can be successfully launched.

  • Cost savings: After a pod is launched, it is billed at the pay-as-you-go rate. When the pod is not running, you are billed at the lower capacity reservation rate. You can launch and terminate pods according to your business needs.

  • Resource flexibility: You can create GPU pod capacity reservations with different resource specifications to meet various business requirements.

Note
  • GPU pod capacity reservation does not support pods of the BestEffort compute type.

  • GPU pod capacity reservation is compatible with Savings Plans that have matching attributes, such as region and type.

  • Creating a GPU pod capacity reservation is subject to available inventory.

Use cases

  • Periodic resource demands for real-time workloads: Your business experiences predictable "tidal" patterns in resource demand on a daily or weekly basis, and tasks must be executed and completed in real time. For example, real-time inference services.

  • Sudden bursts of resource demand: Your business has unexpected real-time computing needs that require rapid resource delivery and scaling to avoid business impact. For example, resource demand triggered by trending events in an internet business.

Usage and billing example

GPU pod capacity reservation uses a pay-as-you-go billing model. During the active period of a capacity reservation, your fees include:

  • Pay-as-you-go fees for the unused portion of the capacity reservation.

  • Pay-as-you-go fees for the launched pods.

This section uses a scenario where you purchase two GPU pod capacity reservations and create two pay-as-you-go pods (Pod1 and Pod2) to illustrate the workflow and billing calculations at different stages, as shown in the following figure.

Stage 1: Purchase and create a capacity reservation

Before you begin, Enable GPU capacity reservation.

In the Container Service console, navigate to Capacity Reservations > Create GPU capacity reservation. Configure the parameters and click Create capacity reservation.

Parameter

Description

Capacity reservation name

A custom name for the capacity reservation.

Reservation type

The GPU type.

Region

The region where you want to reserve resources.

Availability zone

The availability zone where you want to reserve resources.

Resource specification

The specifications for the capacity reservation. You only need to select the number of GPUs. The system automatically matches it with the highest vCPU and memory specifications available for that GPU count.

Reservation mode

Pod reservation (cannot be modified).

Billing model

Pay-as-you-go (cannot be modified).

Quantity

The number of GPU pod capacity reservations for the specified resource specifications.

The billing calculation for this stage is as follows:

Stage

Fee

Description

Stage 1

None

No capacity reservation has been created.

Stages 2–6: Active reservation period

During the active period, you can create pod instances at any time as long as their configurations do not exceed the reserved specifications. The system guarantees that these pods can be launched successfully, and the corresponding capacity reservation is utilized. For a reservation to be utilized, the pod's GPU (type and count), vCPU, and memory must not exceed the reserved configuration. A successful match fully utilizes the reservation. For example, if you purchase a capacity reservation for one GPU, 10 vCPUs, and 80 GB of memory, and then create a pod with one GPU, one vCPU, and 2 GB of memory, the reservation is fully utilized. When the pod is terminated, the capacity reservation becomes available again.

The billing calculations for these stages are as follows:

Stage

Fee

Stage 2

2 × unit price of capacity reservation × Duration of Stage 2

Stage 3

1 × unit price of capacity reservation × Duration of Stage 3 +

pay-as-you-go unit price of Pod1 × Duration of Stage 3

Stage 4

pay-as-you-go unit price of Pod1 × Duration of Stage 4 +

pay-as-you-go unit price of Pod2 × Duration of Stage 4

Stage 5

1 × unit price of capacity reservation × Duration of Stage 5 +

pay-as-you-go unit price of Pod2 × Duration of Stage 5

Stage 6

2 × unit price of capacity reservation × Duration of Stage 6

The unit price of a capacity reservation is the pay-as-you-go fee for an unused reservation. The pay-as-you-go unit prices of Pod1 and Pod2 are the standard pay-as-you-go fees for the pods after they are launched.

Stage 7: Reservation expiration

When the capacity reservation expires, the system automatically releases it.

Specifications

After the capacity reservation specification upgrade, capacity reservation supports the following GPU types and specifications:

GPU type

GPU

vCPU

Memory (GiB)

L20 (GN8IS)

1 (48 GB video memory)

16

128

2 (48 GB × 2 video memory)

32

230

4 (48 GB × 4 video memory)

64

460

8 (48 GB × 8 video memory)

128

920

T4

1 (16 GB video memory)

24

90

2 (16 GB × 2 video memory)

48

180

A10

1 (24 GB video memory)

16

60

2 (24 GB × 2 video memory)

32

120

4 (24 GB × 4 video memory)

64

240

8 (24 GB × 8 video memory)

128

480

P16EN

1 (96 GB video memory)

10

80

2 (96 GB × 2 video memory)

22

225

4 (96 GB × 4 video memory)

46

450

8 (96 GB × 8 video memory)

92

900

16 (96 GB × 16 video memory)

184

1800

GU8TF

1 (96 GB video memory)

16

128

2 (96 GB × 2 video memory)

46

230

4 (96 GB × 4 video memory)

92

460

8 (96 GB × 8 video memory)

184

920

GU8TEF

1 (141 GB video memory)

22

225

2 (141 GB × 2 video memory)

46

450

4 (141 GB × 4 video memory)

92

900

8 (141 GB × 8 video memory)

184

1800

L20X (GX8SF)

1 (141 GB video memory)

22

225

2 (141 GB × 2 video memory)

46

450

4 (141 GB × 4 video memory)

92

900

8 (141 GB × 8 video memory)

184

1800

Utilization rules

For a pod to utilize a capacity reservation, all of the following conditions must be met:

  • The GPU type of the pod must exactly match the reserved GPU type. For example, both the reservation and the pod use the L20 GPU type.

  • The pod's GPU count must exactly match the reserved GPU count. For example, both the reservation and the pod are for one GPU.

  • The pod's vCPU count must be less than or equal to the reserved vCPU count.

  • The pod's memory amount must be less than or equal to the reserved memory amount.

The following scenarios assume that the pod's GPU type matches the reserved GPU type:

Utilization principle

Scenario

Result and description

Exact match or downward compatibility

Reservation: 1 × (1 GPU, 16 vCPUs, 128 GB).

pod created: 1 × (1 GPU, 8 vCPUs, 16 GB).

Result: Successful utilization.

Description: The resources required by the pod (GPU count, vCPUs, and memory) do not exceed the reserved specifications, resulting in a successful match. This pod fully utilizes the capacity reservation.

Smallest specification first

Reservations:

  • 1 × (1 GPU, 10 vCPUs, 80 GB).

  • 1 × (1 GPU, 16 vCPUs, 128 GB).

pod created: 1 × (1 GPU, 5 vCPUs, 30 GB).

Result: The reservation for 1 GPU, 10 vCPUs, and 80 GB is utilized first.

Description: To maximize resource efficiency, the system prioritizes the smallest available reservation that fits the pod's requirements.

First-In, First-Out (FIFO)

Reservations: 4 × (1 GPU, 10 vCPUs, 80 GB), created at different times.

pods created: 4 × (1 GPU, 5 vCPUs, 30 GB).

Result: The four pods utilize the four reservations in the order the reservations were created, from earliest to latest.

Description: For reservations with identical specifications, the FIFO principle is applied.

Atomicity of multi-GPU specifications (indivisible)

Reservation: 1 × (4 GPUs, 46 vCPUs, 450 GB).

pods created: 4 × (1 GPU, 10 vCPUs, 60 GB).

Result: The reservation is not utilized.

Description: A multi-GPU reservation is atomic and cannot be split to satisfy multiple smaller pods. These four pods are created as pay-as-you-go instances.

Mixed-specification matching

Reservations:

  • 1 × (2 GPUs, 22 vCPUs, 225 GB).

  • 1 × (4 GPUs, 46 vCPUs, 450 GB).

pods created:

  • 2 × (1 GPU, 12 vCPUs, 60 GB).

  • 2 × (2 GPUs, 20 vCPUs, 120 GB).

Result: Only one pod with 2 GPUs, 20 vCPUs, and 120 GB successfully utilizes the reservation for 2 GPUs, 22 vCPUs, and 225 GB.

Description: The other pods cannot be matched with the remaining 4-GPU reservation and are therefore created as pay-as-you-go instances.

Real-time dynamic matching

Existing pay-as-you-go pod: 1 × (1 GPU, 5 vCPUs, 30 GB)

New reservation purchased: 1 × (1 GPU, 10 vCPUs, 80 GB)

Result: After the new reservation is successfully created, it immediately and automatically matches and utilizes the existing pay-as-you-go pod.

Description: Capacity reservations can be utilized by existing pay-as-you-go pods that meet the matching criteria.