Resource quotas allow you to dedicate pools of computing resources to different teams and set preemption rules.
Background
Example
Suppose you have purchased 128 GPUs to be allocated among three teams: A, B, and C.
Team A is responsible for an inference service and requires the highest level of resource guarantees.
Team B and Team C are responsible for model training.
Training jobs have a lower priority than the inference service. If Team A has insufficient resources for inference, the system automatically reclaims computing resources from Teams B and C to prioritize the inference service.
The computing resources for Teams B and C can be scaled dynamically.
Teams B and C manage their resources and jobs independently.
Solution
The following solution is based on the scenario shown in the figure above:
Create a resource quota named
Quota1with 128 GPUs and turn on the Child-level Preemption switch. Then, create two child quotas underQuota1:Quota1.1(48 GPUs) andQuota1.2(80 GPUs). As shown in the figure,Quota1,Quota1.1, andQuota1.2form a parent-child relationship (QuotaTree), whereQuota1is the parent quota, andQuota1.1andQuota1.2are child quotas.Deploy an Elastic Algorithm Service (EAS) inference service on
Quota1.Create a workspace named
workspace-bfor Team B and bind it toQuota1.1. Create a Deep Learning Containers (DLC) training job onQuota1.1.Create a workspace named
workspace-cfor Team C and bind it toQuota1.2. Create a Data Science Workshop (DSW) instance onQuota1.2for model development.
Procedure
Prepare AI computing resources (general-purpose computing resources or Lingjun resources). Note that if you use a general-purpose resource pool, you must create a Version 2.0 pool to use with EAS, DLC, and DSW. For more information, see Resource Pools.
Create a quota.
Create a resource quota named
Quota1and configure the following key parameters. For more information, see Create a resource quota or General-purpose computing resource quotas.Select the Specifications/Resources (128 GPUs).
Turn on the Child-level Preemption switch. When this option is enabled, jobs in the parent quota can preempt resources from its child quotas.
In the Actions column for
Quota1, click New Child-level Resource Quota to create the following two child quotas. For more information, see Create parent-child quotas.Set Resource Quota Name to
Quota1.1and select the Specifications/Resources (48 GPUs).Set Resource Quota Name to
Quota1.2and select the Specifications/Resources (80 GPUs).
Create the following three workspaces and bind them to the corresponding resource quotas. For more information, see Create and manage workspaces.
Team A: Set Workspace Name to
workspace-aand selectQuota1for Associated Resources.Team B: Set Workspace Name to
workspace-band selectQuota1.1for Associated Resources.Team C: Set Workspace Name to
workspace-cand selectQuota1.2for Associated Resources.
Grant workspace administrator permissions to Teams A, B, and C. For more information, see Configure a workspace. You can also refer to Appendix: Roles and permissions to grant other permissions.
Create an inference service and training jobs.
Team A creates an inference service in
workspace-a. For more information, see Deploy a service.Team B creates a Deep Learning Containers (DLC) job in
workspace-b. For more information, see Create a training job.Team C creates a Data Science Workshop (DSW) instance in
workspace-c. For more information, see Create a DSW instance.
Use cases
Scenario 1: Inference service preempts resources from training jobs
An administrator goes to the Resource Quotas page, clicks the parent resource quota Quota1, and turns on the Child-level Preemption switch on the Overview tab.
After this switch is enabled, if Team A submits a new inference service on Quota1 but the quota lacks sufficient resources due to active training jobs from Teams B and C, the system automatically preempts resources from the training jobs to run the new inference service.
Scenario 2: Reallocate resources between teams
To reallocate resources between Teams B and C, an administrator can scale Quota1.1 and Quota1.2. For more information, see Scale quotas. On the Resource Quotas management page, find the target quota in the resource list and click Scale in the Actions column.
Scale up
Quota1.1from 48 GPUs to 56 GPUs, an increase of 8 GPUs.Scale down
Quota1.2from 80 GPUs to 72 GPUs, a decrease of 8 GPUs.
Scenario 3: Isolate permissions between teams
Quota1.1 is allocated to workspace-b for Team B, and Quota1.2 is allocated to workspace-c for Team C. Teams B and C can independently manage resources and jobs within their respective workspaces. For more information, see Workspace Scheduling Center. An administrator can go to the Workspace Settings page, select the Scheduling Configurations tab, and configure Resource User Roles in the Resource Usage section. In the table, select a Usable Role for a specified Resource Quota. Options include Basic Roles, Custom Roles, or Non-Workspace Members. You can also select the RAM authorized user checkbox. Click +Add to add a configuration, and then click Save.