ACS GPU overview

更新时间:
复制 MD 格式

Alibaba Cloud Container Compute Service (ACS) supports unified scheduling for heterogeneous computing resources. With its serverless resource model, ACS simplifies the management of heterogeneous computing clusters. This document provides an overview of the GPU resources and usage patterns available in ACS.

Typical ACS GPU workflow

ACS provides a highly elastic and cost-effective solution for AI workloads, covering the entire lifecycle from data preprocessing and model training to inference deployment. It combines the on-demand usage and auto-scaling features of a serverless architecture with powerful GPU computing capabilities, empowering developers and data scientists to focus on business logic and algorithm innovation, not underlying resource management.

  1. Data preprocessing stage: For tasks that involve cleansing, transforming, and augmenting large datasets, you can use the parallel processing power of serverless CPUs. You can launch a large number of CPU instances on demand to accelerate computation. After the tasks are complete, the resources are immediately released, so you do not pay for idle time. This approach is highly efficient for periodic or bursty data batch processing and significantly shortens data preparation cycles.

  2. Model training stage: During the compute-intensive model training phase, serverless GPUs let you flexibly select GPU instances with the required specifications based on your model size and convergence speed requirements. You are billed for the exact duration of your training jobs with per-second precision. This eliminates the cost of idle GPU servers common in traditional setups and is ideal for experimental tuning and iterative training.

    For resource certainty and flexibility in model training workloads, use a GPU-HPN capacity reservation.

  3. Inference deployment stage: After a model is trained, you can seamlessly deploy it as an online inference service. The serverless architecture of ACS automatically scales GPU instances in seconds based on real-time request traffic, and can even scale instances down to zero. Therefore, you incur no resource costs when there is no traffic. This extreme elasticity is ideal for AI applications with highly variable or bursty traffic patterns, such as image recognition and natural language processing. This ensures high service availability while maximizing cost savings.

With ACS serverless GPUs, you can efficiently manage your entire AI workflow on a single platform, optimizing resource allocation and costs to accelerate the development and deployment of your AI applications.

image

Supported GPU types

GPU type

Memory

GPU count

RDMA support

GU8TF

96 GB

1/2/4/8

Yes

GU8TEF

141 GB

1/2/4/8

Yes

L20 (GN8IS)

48 GB

1/2/4/8

No

L20X (GX8SF)

141 GB

8

Yes

P16EN

96 GB

1/2/4/8/16

Yes

T4

16 GB

1/2

No

A10

24 GB

1/2/4/8

No

G28Ti

11 GB

1

No

G49E

48 GB

1/2/4/8

No

G59

32 GB

1/2/4/8

No

L20N

48 GB

1/2/4/8

No

L20NE

72 GB

1/2/4/8

No

For more information about GPU specifications, see GPU instance families supported by ACS.

GPU availability zones

Availability zone

Supported GPU types

cn-wulanchabu-a

GU8TF, L20, G49E

cn-wulanchabu-b

G59

cn-wulanchabu-c

P16EN

cn-wulanchabu-d

P16EN, L20NE

cn-beijing-d

GU8TF, GU8TEF, P16EN

cn-beijing-h

G28Ti

cn-beijing-i

A10, G28Ti, L20N

cn-beijing-l

L20, G49E, G59, L20NE

cn-shanghai-e

G59, G28Ti

cn-shanghai-f

GU8TF, GU8TEF, P16EN

cn-shanghai-l

L20, G49E, T4, G28Ti

cn-shanghai-n

L20, L20N

cn-shanghai-o

P16EN

cn-hangzhou-b

GU8TF, L20, G49E, P16EN, G59

cn-hangzhou-i

T4

cn-shenzhen-c

L20

cn-shenzhen-d

GU8TEF, G49E, G59

cn-shenzhen-e

T4

cn-hongkong-d

GU8TEF

ap-southeast-1

GU8TF, L20, L20X

eu-central-1-a

L20

eu-central-1-c

GU8TEF

me-east-1-a

GU8TEF

us-east-1-a

A10, L20

us-east-1-b

A10, L20

ACS GPU capacity reservation

ACS GPU-enabled cluster types

ACS GPU resource scheduling

ACS GPU monitoring

ACS GPU troubleshooting

Monitor and recover from GPU-HPN node failures