Alibaba Cloud Container Compute Service (ACS) supports unified scheduling for heterogeneous computing resources. With its serverless resource model, ACS simplifies the management of heterogeneous computing clusters. This document provides an overview of the GPU resources and usage patterns available in ACS.
Typical ACS GPU workflow
ACS provides a highly elastic and cost-effective solution for AI workloads, covering the entire lifecycle from data preprocessing and model training to inference deployment. It combines the on-demand usage and auto-scaling features of a serverless architecture with powerful GPU computing capabilities, empowering developers and data scientists to focus on business logic and algorithm innovation, not underlying resource management.
Data preprocessing stage: For tasks that involve cleansing, transforming, and augmenting large datasets, you can use the parallel processing power of serverless CPUs. You can launch a large number of CPU instances on demand to accelerate computation. After the tasks are complete, the resources are immediately released, so you do not pay for idle time. This approach is highly efficient for periodic or bursty data batch processing and significantly shortens data preparation cycles.
Model training stage: During the compute-intensive model training phase, serverless GPUs let you flexibly select GPU instances with the required specifications based on your model size and convergence speed requirements. You are billed for the exact duration of your training jobs with per-second precision. This eliminates the cost of idle GPU servers common in traditional setups and is ideal for experimental tuning and iterative training.
For resource certainty and flexibility in model training workloads, use a GPU-HPN capacity reservation.
Inference deployment stage: After a model is trained, you can seamlessly deploy it as an online inference service. The serverless architecture of ACS automatically scales GPU instances in seconds based on real-time request traffic, and can even scale instances down to zero. Therefore, you incur no resource costs when there is no traffic. This extreme elasticity is ideal for AI applications with highly variable or bursty traffic patterns, such as image recognition and natural language processing. This ensures high service availability while maximizing cost savings.
With ACS serverless GPUs, you can efficiently manage your entire AI workflow on a single platform, optimizing resource allocation and costs to accelerate the development and deployment of your AI applications.

Supported GPU types
GPU type | Memory | GPU count | RDMA support |
96 GB | 1/2/4/8 | Yes | |
141 GB | 1/2/4/8 | Yes | |
48 GB | 1/2/4/8 | No | |
141 GB | 8 | Yes | |
96 GB | 1/2/4/8/16 | Yes | |
16 GB | 1/2 | No | |
24 GB | 1/2/4/8 | No | |
11 GB | 1 | No | |
48 GB | 1/2/4/8 | No | |
32 GB | 1/2/4/8 | No | |
48 GB | 1/2/4/8 | No | |
72 GB | 1/2/4/8 | No |
For more information about GPU specifications, see GPU instance families supported by ACS.
GPU availability zones
Availability zone | Supported GPU types |
cn-wulanchabu-a | GU8TF, L20, G49E |
cn-wulanchabu-b | G59 |
cn-wulanchabu-c | P16EN |
cn-wulanchabu-d | P16EN, L20NE |
cn-beijing-d | GU8TF, GU8TEF, P16EN |
cn-beijing-h | G28Ti |
cn-beijing-i | A10, G28Ti, L20N |
cn-beijing-l | L20, G49E, G59, L20NE |
cn-shanghai-e | G59, G28Ti |
cn-shanghai-f | GU8TF, GU8TEF, P16EN |
cn-shanghai-l | L20, G49E, T4, G28Ti |
cn-shanghai-n | L20, L20N |
cn-shanghai-o | P16EN |
cn-hangzhou-b | GU8TF, L20, G49E, P16EN, G59 |
cn-hangzhou-i | T4 |
cn-shenzhen-c | L20 |
cn-shenzhen-d | GU8TEF, G49E, G59 |
cn-shenzhen-e | T4 |
cn-hongkong-d | GU8TEF |
ap-southeast-1 | GU8TF, L20, L20X |
eu-central-1-a | L20 |
eu-central-1-c | GU8TEF |
me-east-1-a | GU8TEF |
us-east-1-a | A10, L20 |
us-east-1-b | A10, L20 |