Manage integrated training and inference resources

更新时间:
复制 MD 格式

Resource quotas allow you to dedicate pools of computing resources to different teams and set preemption rules.

Background

Example

Suppose you have purchased 128 GPUs to be allocated among three teams: A, B, and C.

  • Team A is responsible for an inference service and requires the highest level of resource guarantees.

  • Team B and Team C are responsible for model training.

  • Training jobs have a lower priority than the inference service. If Team A has insufficient resources for inference, the system automatically reclaims computing resources from Teams B and C to prioritize the inference service.

  • The computing resources for Teams B and C can be scaled dynamically.

  • Teams B and C manage their resources and jobs independently.

Solution

image

The following solution is based on the scenario shown in the figure above:

  • Create a resource quota named Quota1 with 128 GPUs and turn on the Child-level Preemption switch. Then, create two child quotas under Quota1: Quota1.1 (48 GPUs) and Quota1.2 (80 GPUs). As shown in the figure, Quota1, Quota1.1, and Quota1.2 form a parent-child relationship (QuotaTree), where Quota1 is the parent quota, and Quota1.1 and Quota1.2 are child quotas.

  • Deploy an Elastic Algorithm Service (EAS) inference service on Quota1.

  • Create a workspace named workspace-b for Team B and bind it to Quota1.1. Create a Deep Learning Containers (DLC) training job on Quota1.1.

  • Create a workspace named workspace-c for Team C and bind it to Quota1.2. Create a Data Science Workshop (DSW) instance on Quota1.2 for model development.

Procedure

  1. Prepare AI computing resources (general-purpose computing resources or Lingjun resources). Note that if you use a general-purpose resource pool, you must create a Version 2.0 pool to use with EAS, DLC, and DSW. For more information, see Resource Pools.

  2. Create a quota.

    1. Create a resource quota named Quota1 and configure the following key parameters. For more information, see Create a resource quota or General-purpose computing resource quotas.

      • Select the Specifications/Resources (128 GPUs).

      • Turn on the Child-level Preemption switch. When this option is enabled, jobs in the parent quota can preempt resources from its child quotas.

    2. In the Actions column for Quota1, click New Child-level Resource Quota to create the following two child quotas. For more information, see Create parent-child quotas.

      • Set Resource Quota Name to Quota1.1 and select the Specifications/Resources (48 GPUs).

      • Set Resource Quota Name to Quota1.2 and select the Specifications/Resources (80 GPUs).

  3. Create the following three workspaces and bind them to the corresponding resource quotas. For more information, see Create and manage workspaces.

    • Team A: Set Workspace Name to workspace-a and select Quota1 for Associated Resources.

    • Team B: Set Workspace Name to workspace-b and select Quota1.1 for Associated Resources.

    • Team C: Set Workspace Name to workspace-c and select Quota1.2 for Associated Resources.

  4. Grant workspace administrator permissions to Teams A, B, and C. For more information, see Configure a workspace. You can also refer to Appendix: Roles and permissions to grant other permissions.

  5. Create an inference service and training jobs.

    • Team A creates an inference service in workspace-a. For more information, see Deploy a service.

    • Team B creates a Deep Learning Containers (DLC) job in workspace-b. For more information, see Create a training job.

    • Team C creates a Data Science Workshop (DSW) instance in workspace-c. For more information, see Create a DSW instance.

Use cases

Scenario 1: Inference service preempts resources from training jobs

An administrator goes to the Resource Quotas page, clicks the parent resource quota Quota1, and turns on the Child-level Preemption switch on the Overview tab.

After this switch is enabled, if Team A submits a new inference service on Quota1 but the quota lacks sufficient resources due to active training jobs from Teams B and C, the system automatically preempts resources from the training jobs to run the new inference service.

Scenario 2: Reallocate resources between teams

To reallocate resources between Teams B and C, an administrator can scale Quota1.1 and Quota1.2. For more information, see Scale quotas. On the Resource Quotas management page, find the target quota in the resource list and click Scale in the Actions column.

  • Scale up Quota1.1 from 48 GPUs to 56 GPUs, an increase of 8 GPUs.

  • Scale down Quota1.2 from 80 GPUs to 72 GPUs, a decrease of 8 GPUs.

Scenario 3: Isolate permissions between teams

Quota1.1 is allocated to workspace-b for Team B, and Quota1.2 is allocated to workspace-c for Team C. Teams B and C can independently manage resources and jobs within their respective workspaces. For more information, see Workspace Scheduling Center. An administrator can go to the Workspace Settings page, select the Scheduling Configurations tab, and configure Resource User Roles in the Resource Usage section. In the table, select a Usable Role for a specified Resource Quota. Options include Basic Roles, Custom Roles, or Non-Workspace Members. You can also select the RAM authorized user checkbox. Click +Add to add a configuration, and then click Save.

Related documents