GPU pod capacity reservation-Container Compute Service(ACS)-阿里云帮助中心

Pod-based capacity reservation provides resource certainty for elastic workloads. GPU pod capacity reservation does not require binding to a specific cluster — specify attributes such as pod specifications, availability zone, and reservation duration at purchase, and the system guarantees that matching pods launch on demand within minutes at a lower price than pay-as-you-go pods.

Features

Resource certainty: During the effective period of a GPU pod capacity reservation, the system guarantees that resources can be successfully launched.
Cost savings: After a pod is launched, it is billed at the pay-as-you-go rate. When the pod is not running, you are billed at the lower capacity reservation rate. You can launch and terminate pods according to your business needs.
Resource flexibility: You can create GPU pod capacity reservations with different resource specifications to meet various business requirements.

Note

GPU pod capacity reservation does not support pods of the BestEffort compute type.
GPU pod capacity reservation is compatible with Savings Plans that have matching attributes, such as region and type.
Creating a GPU pod capacity reservation is subject to available inventory.

Use cases

Periodic resource demands for real-time workloads: Your business has predictable daily or weekly resource demand patterns, and tasks must complete in real time. For example, real-time inference services.
Sudden bursts of resource demand: Your business has unexpected real-time computing needs that require rapid resource delivery and scaling to avoid impact. For example, spikes triggered by trending events in an internet business.

Usage and billing example

GPU pod capacity reservation uses a pay-as-you-go billing model. During the active period of a capacity reservation, your fees include:

Pay-as-you-go fees for the unused portion of the capacity reservation.
Pay-as-you-go fees for the launched pods.

The following example illustrates the workflow and billing at different stages, using a scenario with two GPU pod capacity reservations and two pay-as-you-go pods (Pod1 and Pod2).

Stage 1: Purchase and create a capacity reservation

Before you begin, Enable GPU capacity reservation.

In the Container Service console, navigate to Capacity Reservations > Create GPU capacity reservation. Configure the parameters and click Create capacity reservation.

Parameter	Description
Capacity reservation name	A custom name for the capacity reservation.
Reservation type	The GPU type.
Region	The region where you want to reserve resources.
Availability zone	The availability zone where you want to reserve resources.
Resource specification	The specifications for the capacity reservation. Select the number of GPUs, and the system automatically matches the highest vCPU and memory specifications available for that GPU count.
Reservation mode	Pod reservation (cannot be modified).
Billing model	Pay-as-you-go (cannot be modified).
Quantity	The number of GPU pod capacity reservations for the specified resource specifications.

The billing calculation for this stage is as follows:

Stage	Fee	Description
Stage 1	None	No capacity reservation has been created.

Stages 2–6: Active reservation period

During the active period, you can create pod instances at any time as long as their configurations do not exceed the reserved specifications. The system guarantees successful pod launches, and the corresponding capacity reservation is utilized. The pod's GPU (type and count), vCPU, and memory must not exceed the reserved configuration. A successful match fully utilizes the reservation. For example, if you purchase a reservation for one GPU, 10 vCPUs, and 80 GB of memory and create a pod with one GPU, one vCPU, and 2 GB of memory, the reservation is fully utilized. When the pod is terminated, the capacity reservation becomes available again.

The billing calculations for these stages are as follows:

Stage	Fee
Stage 2	2 × unit price of capacity reservation × Duration of Stage 2
Stage 3	1 × unit price of capacity reservation × Duration of Stage 3 + pay-as-you-go unit price of Pod1 × Duration of Stage 3
Stage 4	pay-as-you-go unit price of Pod1 × Duration of Stage 4 + pay-as-you-go unit price of Pod2 × Duration of Stage 4
Stage 5	1 × unit price of capacity reservation × Duration of Stage 5 + pay-as-you-go unit price of Pod2 × Duration of Stage 5
Stage 6	2 × unit price of capacity reservation × Duration of Stage 6

The unit price of a capacity reservation is the pay-as-you-go fee for an unused reservation. The pay-as-you-go unit prices of Pod1 and Pod2 are the standard pay-as-you-go fees for those pods after launch.

Stage 7: Reservation expiration

When the capacity reservation expires, the system automatically releases it.

Specifications

After the capacity reservation specification upgrade, the following GPU types and specifications are supported:

GPU type	GPU	vCPU	Memory (GiB)
L20 (GN8IS)	1 (48 GB video memory)	16	128
	2 (48 GB × 2 video memory)	32	230
	4 (48 GB × 4 video memory)	64	460
	8 (48 GB × 8 video memory)	128	920
T4	1 (16 GB video memory)	24	90
T4	2 (16 GB × 2 video memory)	48	180
A10	1 (24 GB video memory)	16	60
	2 (24 GB × 2 video memory)	32	120
	4 (24 GB × 4 video memory)	64	240
	8 (24 GB × 8 video memory)	128	480
P16EN	1 (96 GB video memory)	10	80
	2 (96 GB × 2 video memory)	22	225
	4 (96 GB × 4 video memory)	46	450
	8 (96 GB × 8 video memory)	92	900
	16 (96 GB × 16 video memory)	184	1800
GU8TF	1 (96 GB video memory)	16	128
	2 (96 GB × 2 video memory)	46	230
	4 (96 GB × 4 video memory)	92	460
	8 (96 GB × 8 video memory)	184	920
GU8TEF	1 (141 GB video memory)	22	225
	2 (141 GB × 2 video memory)	46	450
	4 (141 GB × 4 video memory)	92	900
	8 (141 GB × 8 video memory)	184	1800
L20X (GX8SF)	1 (141 GB video memory)	22	225
	2 (141 GB × 2 video memory)	46	450
	4 (141 GB × 4 video memory)	92	900
	8 (141 GB × 8 video memory)	184	1800

Utilization rules

For a pod to utilize a capacity reservation, all of the following conditions must be met:

The GPU type of the pod must exactly match the reserved GPU type. For example, both the reservation and the pod use the L20 GPU type.
The pod's GPU count must exactly match the reserved GPU count. For example, both the reservation and the pod are for one GPU.
The pod's vCPU count must be less than or equal to the reserved vCPU count.
The pod's memory amount must be less than or equal to the reserved memory amount.

The following scenarios assume that the pod's GPU type matches the reserved GPU type:

Utilization principle

Scenario

Result and description

Exact match or downward compatibility

Reservation: 1 × (1 GPU, 16 vCPUs, 128 GB).

pod created: 1 × (1 GPU, 8 vCPUs, 16 GB).

Result: $✓$ Successful utilization.

Description: The pod's resource requirements (GPU count, vCPUs, and memory) do not exceed the reserved specifications, so the match succeeds and the reservation is fully utilized.

Smallest specification first

Reservations:

1 × (1 GPU, 10 vCPUs, 80 GB).
1 × (1 GPU, 16 vCPUs, 128 GB).

pod created: 1 × (1 GPU, 5 vCPUs, 30 GB).

Result: $✓$ The reservation for 1 GPU, 10 vCPUs, and 80 GB is utilized first.

Description: To maximize resource efficiency, the system prioritizes the smallest available reservation that fits the pod's requirements.

First-In, First-Out (FIFO)

Reservations: 4 × (1 GPU, 10 vCPUs, 80 GB), created at different times.

pods created: 4 × (1 GPU, 5 vCPUs, 30 GB).

Result: $✓$ The four pods utilize the four reservations in the order the reservations were created, from earliest to latest.

Description: For reservations with identical specifications, the FIFO principle is applied.

Atomicity of multi-GPU specifications (indivisible)

Reservation: 1 × (4 GPUs, 46 vCPUs, 450 GB).

pods created: 4 × (1 GPU, 10 vCPUs, 60 GB).

Result: $\times$ The reservation is not utilized.

Description: A multi-GPU reservation is atomic and cannot be split to satisfy multiple smaller pods. These four pods are created as pay-as-you-go instances.

Mixed-specification matching

Reservations:

1 × (2 GPUs, 22 vCPUs, 225 GB).
1 × (4 GPUs, 46 vCPUs, 450 GB).

pods created:

2 × (1 GPU, 12 vCPUs, 60 GB).
2 × (2 GPUs, 20 vCPUs, 120 GB).

Result: $\times$ Only one pod with 2 GPUs, 20 vCPUs, and 120 GB successfully utilizes the reservation for 2 GPUs, 22 vCPUs, and 225 GB.

Description: The other pods cannot be matched with the remaining 4-GPU reservation and are therefore created as pay-as-you-go instances.

Real-time dynamic matching

Existing pay-as-you-go pod: 1 × (1 GPU, 5 vCPUs, 30 GB)

New reservation purchased: 1 × (1 GPU, 10 vCPUs, 80 GB)

Result: $✓$ After the new reservation is successfully created, it immediately and automatically matches and utilizes the existing pay-as-you-go pod.

Description: Capacity reservations can be utilized by existing pay-as-you-go pods that meet the matching criteria.