After you enable Auto Mode for a cluster, you can use an Auto Mode node pool to dynamically scale GPU resources. This approach significantly reduces costs for GPU workloads with fluctuating demand, such as online inference services.
Key advantages
When you use an Auto Mode node pool to provision GPU-accelerated instances, the nodes run a GPU-optimized version of ContainerOS by default. This accelerates GPU node startup. The key advantages are as follows:
Pre-installed NVIDIA drivers for faster startup
The GPU-optimized image comes with pre-installed NVIDIA drivers and the required runtime environment. This eliminates additional installation and configuration after node startup.
Automated node management to reduce operational complexity
An Auto Mode node pool simplifies GPU maintenance by automating node pool management, system upgrades, component maintenance, and security patches to provide an out-of-the-box experience for GPU resources.
Streamlined base software stack
Nodes use a more lightweight and secure ContainerOS, which accelerates node startup.
Prerequisites
Nodes run ContainerOS 3.6 or later. To upgrade the ContainerOS version, see Change the operating system.
Step 1: Create a GPU Auto Mode node pool
We recommend that you create a separate node pool for GPU workloads. When you submit a GPU workload, the system automatically creates GPU nodes based on resource requests. When the nodes become idle and meet the scale-in conditions, they are automatically released. This ensures you are charged only for the resources you use.
On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click .
Click Create Node Pool and configure the parameters as prompted.
The following table describes the key parameters. For more information about the parameters, see Create a node pool.
Parameter
Description
Configure Managed Node Pool
Select Auto Mode.
vSwitch
During scaling, nodes scale in or out in the availability zones of the selected vSwitches based on the scaling policy. For high availability, select vSwitches in two or more different availability zones.
Instance Settings
Set Instance Configuration Mode to Specify Instance Type.
Architecture: GPU-accelerated.
Instance Type: Select a suitable instance family based on your business requirements, such as ecs.gn7i-c8g1.2xlarge (NVIDIA A10). To increase the success rate of scale-out events, we recommend selecting multiple instance types.
Taints
To prevent non-GPU workloads from being scheduled onto expensive GPU nodes, we recommend that you add a taint to the node pool for logical isolation.
Key: nvidia.com/gpu
Value: true
Effect: NoSchedule
Step 2: Configure requests and tolerations
To ensure that your application can be scheduled to the node pool and trigger the automatic creation of GPU nodes, you must declare the GPU resource request and a toleration for the node taint in the YAML configuration.
Configure the GPU resource request: In the container's
resourcesfield, declare the required GPU resources.# ... spec: containers: - name: gpu-automode resources: limits: nvidia.com/gpu: 1 # Request one GPU. # ...Configure the toleration: Add the
tolerationsfield to match the taint of the node pool. This allows the pod to be scheduled onto nodes with that taint.The following toleration configuration is for reference only.
# ... spec: tolerations: - key: "nvidia.com/gpu" # Matches the taint key set on the node pool. operator: "Equal" value: "true" # Matches the taint value set on the node pool. effect: "NoSchedule" # Matches the taint effect set on the node pool. # ...
Step 3: Deploy GPU workload and verify autoscaling
This example uses a Stable Diffusion Web UI application to demonstrate the end-to-end deployment process and verify autoscaling.
Create and deploy the workload.
Verify automatic scale-out.
After deployment, the pod enters the
Pendingstate because no GPU resources are available.Check the pod status.
kubectl get pod -l app=stable-diffusionCheck the pod events.
kubectl describe pod -l app=stable-diffusionIn the
Eventssection, you should see aFailedSchedulingevent, followed by aProvisionNodeevent. This indicates that a scale-out has been triggered....... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 15m default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 2 Insufficient cpu, 2 Insufficient memory, 2 Insufficient nvidia.com/gpu. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling., , Normal ProvisionNode 16m GOATScaler Provision node asa-2ze2h0f4m5ctpd8kn4f1 in Zone: cn-beijing-k with InstanceType: ecs.gn7i-c8g1.2xlarge, Triggered time 2025-11-19 02:58:01.096 Normal AllocIPSucceed 12m terway-daemon Alloc IP 10.XX.XX.141/16 took 4.764400743s Normal Pulling 12m kubelet Pulling image "yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-gpu" Normal Pulled 3m48s kubelet Successfully pulled image "yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-gpu" in 8m47.675s (8m47.675s including waiting). Image size: 11421866941 bytes. Normal Created 3m42s kubelet Created container: stable-diffusion Normal Started 3m24s kubelet Started container stable-diffusionGet the name of the node where the pod is running.
# Store the name of the node where the pod is running in the NODE_NAME variable. NODE_NAME=$(kubectl get pod -l app=stable-diffusion -o jsonpath='{.items[0].spec.nodeName}') # Print the node name. echo "Stable Diffusion is running on node: $NODE_NAME" # View the details of the node to confirm that it is in the Ready state. kubectl get node $NODE_NAME
Access Stable Diffusion.
Wait a few minutes for the new node to join the cluster and the pod to start. Then, access the application over the internet.Run the following command to obtain the public IP address (
EXTERNAL-IP) of the Service.kubectl get svc stable-diffusion-svcIn the output, find the
EXTERNAL-IP.NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE stable-diffusion-svc LoadBalancer 192.XXX.XX.196 8.XXX.XX.68 7860:31302/TCP 18mIn your browser, enter
http://<EXTERNAL-IP>:7860.If the Stable Diffusion Web UI loads successfully, the workload is running on the GPU node.
Verify automatic scale-in (manual trigger).
To verify the automatic scale-in capability, you can manually delete the Deployment to idle the node.Delete the Deployment and Service that you created.
# Delete the Deployment. kubectl delete deployment stable-diffusion # Delete the Service. kubectl delete service stable-diffusion-svcObserve the node scale-in.
After the Scale-in Trigger Delay elapses (3 minutes by default in Auto Mode), the scaling component automatically removes the node from the cluster to reduce costs. Use the node name you obtained earlier to query the node again.
kubectl get node $NODE_NAMEThe expected output indicates that the node cannot be found. This confirms that the node was automatically removed as expected.
Error from server (NotFound): nodes "<nodeName>" not found