Pod-based capacity reservation provides resource certainty for elastic workloads. A GPU pod capacity reservation does not need to be bound directly to a cluster. You only need to specify attributes such as pod specifications, an availability zone, and a reservation duration at the time of purchase. This guarantees that you can launch pods with matching specifications on demand within minutes. GPU pod capacity reservation provides resource certainty at a lower price than pay-as-you-go pods. This topic describes the features of GPU pod capacity reservation.
Features
-
Resource certainty: During the effective period of a GPU pod capacity reservation, the system guarantees that resources can be successfully launched.
-
Cost savings: After a pod is launched, it is billed at the pay-as-you-go rate. When the pod is not running, you are billed at the lower capacity reservation rate. You can launch and terminate pods according to your business needs.
-
Resource flexibility: You can create GPU pod capacity reservations with different resource specifications to meet various business requirements.
-
GPU pod capacity reservation does not support pods of the BestEffort compute type.
-
GPU pod capacity reservation is compatible with Savings Plans that have matching attributes, such as region and type.
-
Creating a GPU pod capacity reservation is subject to available inventory.
Use cases
-
Periodic resource demands for real-time workloads: Your business experiences predictable "tidal" patterns in resource demand on a daily or weekly basis, and tasks must be executed and completed in real time. For example, real-time inference services.
-
Sudden bursts of resource demand: Your business has unexpected real-time computing needs that require rapid resource delivery and scaling to avoid business impact. For example, resource demand triggered by trending events in an internet business.
Usage and billing example
GPU pod capacity reservation uses a pay-as-you-go billing model. During the active period of a capacity reservation, your fees include:
-
Pay-as-you-go fees for the unused portion of the capacity reservation.
-
Pay-as-you-go fees for the launched pods.
This section uses a scenario where you purchase two GPU pod capacity reservations and create two pay-as-you-go pods (Pod1 and Pod2) to illustrate the workflow and billing calculations at different stages, as shown in the following figure.
Stage 1: Purchase and create a capacity reservation
Before you begin, Enable GPU capacity reservation.
In the Container Service console, navigate to Capacity Reservations > Create GPU capacity reservation. Configure the parameters and click Create capacity reservation.
|
Parameter |
Description |
|
Capacity reservation name |
A custom name for the capacity reservation. |
|
Reservation type |
The GPU type. |
|
Region |
The region where you want to reserve resources. |
|
Availability zone |
The availability zone where you want to reserve resources. |
|
Resource specification |
The specifications for the capacity reservation. You only need to select the number of GPUs. The system automatically matches it with the highest vCPU and memory specifications available for that GPU count. |
|
Reservation mode |
Pod reservation (cannot be modified). |
|
Billing model |
Pay-as-you-go (cannot be modified). |
|
Quantity |
The number of GPU pod capacity reservations for the specified resource specifications. |
The billing calculation for this stage is as follows:
|
Stage |
Fee |
Description |
|
Stage 1 |
None |
No capacity reservation has been created. |
Stages 2–6: Active reservation period
During the active period, you can create pod instances at any time as long as their configurations do not exceed the reserved specifications. The system guarantees that these pods can be launched successfully, and the corresponding capacity reservation is utilized. For a reservation to be utilized, the pod's GPU (type and count), vCPU, and memory must not exceed the reserved configuration. A successful match fully utilizes the reservation. For example, if you purchase a capacity reservation for one GPU, 10 vCPUs, and 80 GB of memory, and then create a pod with one GPU, one vCPU, and 2 GB of memory, the reservation is fully utilized. When the pod is terminated, the capacity reservation becomes available again.
The billing calculations for these stages are as follows:
|
Stage |
Fee |
|
Stage 2 |
2 × unit price of capacity reservation × Duration of Stage 2 |
|
Stage 3 |
1 × unit price of capacity reservation × Duration of Stage 3 + pay-as-you-go unit price of Pod1 × Duration of Stage 3 |
|
Stage 4 |
pay-as-you-go unit price of Pod1 × Duration of Stage 4 + pay-as-you-go unit price of Pod2 × Duration of Stage 4 |
|
Stage 5 |
1 × unit price of capacity reservation × Duration of Stage 5 + pay-as-you-go unit price of Pod2 × Duration of Stage 5 |
|
Stage 6 |
2 × unit price of capacity reservation × Duration of Stage 6 |
The unit price of a capacity reservation is the pay-as-you-go fee for an unused reservation. The pay-as-you-go unit prices of Pod1 and Pod2 are the standard pay-as-you-go fees for the pods after they are launched.
Stage 7: Reservation expiration
When the capacity reservation expires, the system automatically releases it.
Specifications
After the capacity reservation specification upgrade, capacity reservation supports the following GPU types and specifications:
|
GPU type |
GPU |
vCPU |
Memory (GiB) |
|
L20 (GN8IS) |
1 (48 GB video memory) |
16 |
128 |
|
2 (48 GB × 2 video memory) |
32 |
230 |
|
|
4 (48 GB × 4 video memory) |
64 |
460 |
|
|
8 (48 GB × 8 video memory) |
128 |
920 |
|
|
T4 |
1 (16 GB video memory) |
24 |
90 |
|
2 (16 GB × 2 video memory) |
48 |
180 |
|
|
A10 |
1 (24 GB video memory) |
16 |
60 |
|
2 (24 GB × 2 video memory) |
32 |
120 |
|
|
4 (24 GB × 4 video memory) |
64 |
240 |
|
|
8 (24 GB × 8 video memory) |
128 |
480 |
|
|
P16EN |
1 (96 GB video memory) |
10 |
80 |
|
2 (96 GB × 2 video memory) |
22 |
225 |
|
|
4 (96 GB × 4 video memory) |
46 |
450 |
|
|
8 (96 GB × 8 video memory) |
92 |
900 |
|
|
16 (96 GB × 16 video memory) |
184 |
1800 |
|
|
GU8TF |
1 (96 GB video memory) |
16 |
128 |
|
2 (96 GB × 2 video memory) |
46 |
230 |
|
|
4 (96 GB × 4 video memory) |
92 |
460 |
|
|
8 (96 GB × 8 video memory) |
184 |
920 |
|
|
GU8TEF |
1 (141 GB video memory) |
22 |
225 |
|
2 (141 GB × 2 video memory) |
46 |
450 |
|
|
4 (141 GB × 4 video memory) |
92 |
900 |
|
|
8 (141 GB × 8 video memory) |
184 |
1800 |
|
|
L20X (GX8SF) |
1 (141 GB video memory) |
22 |
225 |
|
2 (141 GB × 2 video memory) |
46 |
450 |
|
|
4 (141 GB × 4 video memory) |
92 |
900 |
|
|
8 (141 GB × 8 video memory) |
184 |
1800 |
Utilization rules
For a pod to utilize a capacity reservation, all of the following conditions must be met:
-
The GPU type of the pod must exactly match the reserved GPU type. For example, both the reservation and the pod use the L20 GPU type.
-
The pod's GPU count must exactly match the reserved GPU count. For example, both the reservation and the pod are for one GPU.
-
The pod's vCPU count must be less than or equal to the reserved vCPU count.
-
The pod's memory amount must be less than or equal to the reserved memory amount.
The following scenarios assume that the pod's GPU type matches the reserved GPU type:
|
Utilization principle |
Scenario |
Result and description |
|
Exact match or downward compatibility |
Reservation: 1 × (1 GPU, 16 vCPUs, 128 GB). pod created: 1 × (1 GPU, 8 vCPUs, 16 GB). |
Result: Description: The resources required by the pod (GPU count, vCPUs, and memory) do not exceed the reserved specifications, resulting in a successful match. This pod fully utilizes the capacity reservation. |
|
Smallest specification first |
Reservations:
pod created: 1 × (1 GPU, 5 vCPUs, 30 GB). |
Result: Description: To maximize resource efficiency, the system prioritizes the smallest available reservation that fits the pod's requirements. |
|
First-In, First-Out (FIFO) |
Reservations: 4 × (1 GPU, 10 vCPUs, 80 GB), created at different times. pods created: 4 × (1 GPU, 5 vCPUs, 30 GB). |
Result: Description: For reservations with identical specifications, the FIFO principle is applied. |
|
Atomicity of multi-GPU specifications (indivisible) |
Reservation: 1 × (4 GPUs, 46 vCPUs, 450 GB). pods created: 4 × (1 GPU, 10 vCPUs, 60 GB). |
Result: Description: A multi-GPU reservation is atomic and cannot be split to satisfy multiple smaller pods. These four pods are created as pay-as-you-go instances. |
|
Mixed-specification matching |
Reservations:
pods created:
|
Result: Description: The other pods cannot be matched with the remaining 4-GPU reservation and are therefore created as pay-as-you-go instances. |
|
Real-time dynamic matching |
Existing pay-as-you-go pod: 1 × (1 GPU, 5 vCPUs, 30 GB) New reservation purchased: 1 × (1 GPU, 10 vCPUs, 80 GB) |
Result: Description: Capacity reservations can be utilized by existing pay-as-you-go pods that meet the matching criteria. |