Elastic Algorithm Service (EAS) provides GPU slicing for more cost-effective and efficient use of compute resources when you deploy model services. This feature partitions the compute power and GPU memory of a single physical GPU card to be shared among multiple service instances. This improves GPU utilization and reduces deployment costs.
Prerequisites
You can configure GPU slicing only if you meet the following prerequisites:
Resource Type: Use an EAS resource group or a Lingjun resource quota.
Instance Status: GPU instances in the resource group must be in the running state and not in other states like Starting or Stopped.
NoteWhen you purchase a GPU instance for the first time, initialization typically takes 8 to 10 minutes. Wait for the instance to initialize before proceeding.
Configuration
You can configure GPU slicing when you create or update a service.
Console
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Create a new service or update an existing one to open the service configuration page.
In the Resource Information section, configure the following key parameters. For details on other parameters, see custom deployment.
Parameter
Description
Resource Type
Select EAS Resource Group or Resource Quota.
GPU Slicing
Select this checkbox to enable GPU slicing.
NoteIf this option is not available, see Why is the GPU slicing option missing?.
Deployment Resources
Single-GPU Memory (GB): Required. The GPU memory required for each service instance. The value must be an integer. This allows multiple instances to share a single GPU.
ImportantFor resource specifications that start with ml, the unit for single-GPU memory is GB. For those that start with ecs, the unit is GiB.
Computing Power per GPU (%): Optional. The percentage of compute power from a single GPU required for each service instance. The value must be an integer from 1 to 100. This allows multiple instances to share a single GPU.
The system combines these settings with a logical AND. For example, if you set single-GPU memory to 48 GB and GPU compute percentage to 10%, an instance can use a maximum of 48 GB of GPU memory and 10% of the compute power concurrently.
After you configure the parameters, click Deploy or Update.
Local client
The following example shows the GPU slicing fields in a JSON configuration file:
{ "metadata": { "gpu_core_percentage": 5, "gpu_memory": 20 } }gpu_memory: Corresponds to the Single-GPU Memory (GB) parameter in the PAI console.
gpu_core_percentage: Corresponds to the Computing Power per GPU (%) parameter in the PAI console. If you specify this parameter, you must also specify the gpu_memory parameter. Otherwise, this parameter is ignored.
ImportantIf you use GPU memory-based scheduling, do not configure the gpu field or set it to 0. If the gpu field is set to 1, the instance exclusively uses the entire GPU card. In this case, the gpu_memory and gpu_core_percentage fields are ignored.
See Command reference and use the
createormodifycommand to create or update the service.
FAQ
Q: Why is the GPU slicing option missing?
Perform the following steps to troubleshoot the issue:
Ensure that the Resource Type is set to EAS resource group or Lingjun resource quota.
Verify that the selected resource group contains GPU resources (i.e., the value in the GPU column is not 0).
Verify that the GPU instance status is "running" and not in other states such as "Starting" or "Stopped". If the resource is being initialized, you must wait for the initialization to complete.