alicloud_eflo_experiment_plan_template

更新时间:

Provides a Eflo Experiment Plan Template resource.

For information about Eflo Experiment Plan Template and how to use it, see What is Experiment Plan Template.

-> NOTE: Available since v1.248.0.

Example Usage

Basic Usage


variable "name" {
  default = "terraform-example"
}

provider "alicloud" {
  region = "cn-wulanchabu"
}

resource "alicloud_eflo_experiment_plan_template" "default" {
  template_pipeline {
    workload_id   = "2"
    workload_name = "MatMul"
    env_params {
      cpu_per_worker     = "90"
      gpu_per_worker     = "8"
      memory_per_worker  = "500"
      share_memory       = "500"
      worker_num         = "1"
      py_torch_version   = "1"
      gpu_driver_version = "1"
      cuda_version       = "1"
      nccl_version       = "1"
    }
    pipeline_order = "1"
    scene          = "baseline"
  }
  privacy_level        = "private"
  template_name        = var.name
  template_description = var.name
}

Argument Reference

The following arguments are supported:

  • privacy_level - (Required, ForceNew) Used to indicate the privacy level of the content or information. It can have the following optional parameters:
    • private: Indicates that the content is private and restricted to specific users or permission groups. Private content is usually not publicly displayed, and only authorized users can view or edit it.
    • public: Indicates that the content is public and can be accessed by anyone. Public content is usually viewable by all users and is suitable for sharing information or resources
  • template_description - (Optional, ForceNew) Describe the purpose of this template.
  • template_name - (Required, ForceNew) Help users identify and select specific templates.
  • template_pipeline - (Required, Set) Representative Template Pipeline. See template_pipeline below.

template_pipeline

The template_pipeline supports the following:

  • env_params - (Required, Set) Contains a series of parameters related to the environment. See env_params below.
  • pipeline_order - (Required, Int) Indicates the sequence number of the pipeline node.
  • scene - (Required) The use of the template scenario. It can have the following optional parameters:
    • baseline: benchmark evaluation
  • setting_params - (Optional, Map) Represents additional parameters for the run.
  • workload_id - (Required, Int) Used to uniquely identify a specific payload.
  • workload_name - (Required) The name used to represent a specific payload.

template_pipeline-env_params

The template_pipeline-env_params supports the following:

  • cpu_per_worker - (Required, Int) Number of central processing units (CPUs) allocated. This parameter affects the processing power of the computation, especially in tasks that require a large amount of parallel processing.
  • cuda_version - (Optional) The version of CUDA(Compute Unified Device Architecture) used. CUDA is a parallel computing platform and programming model provided by NVIDIA. A specific version may affect the available GPU functions and performance optimization.
  • gpu_driver_version - (Optional) The version of the GPU driver used. Driver version may affect GPU performance and compatibility, so it is important to ensure that the correct version is used
  • gpu_per_worker - (Required, Int) Number of graphics processing units (GPUs). GPUs are a key component in deep learning and large-scale data processing, so this parameter is very important for tasks that require graphics-accelerated computing.
  • memory_per_worker - (Required, Int) The amount of memory available. Memory size has an important impact on the performance and stability of the program, especially when dealing with large data sets or high-dimensional data.
  • nccl_version - (Optional) The NVIDIA Collective Communications Library(NCCL) version used. NCCL is a library for multi-GPU and multi-node communication. This parameter is particularly important for optimizing data transmission in distributed computing.
  • py_torch_version - (Optional) The version of the PyTorch framework used. PyTorch is a widely used deep learning library, and differences between versions may affect the performance and functional support of model training and inference.
  • share_memory - (Required, Int) Shared memory GB allocation
  • worker_num - (Required, Int) The total number of nodes. This parameter directly affects the parallelism and computing speed of the task, and a higher number of working nodes usually accelerates the completion of the task.

Attributes Reference

The following attributes are exported:

  • id - The ID of the resource supplied above.
  • create_time - The creation time of the resource.
  • template_id - The ID of the template.

Timeouts

The timeouts block allows you to specify timeouts for certain actions:

  • create - (Defaults to 5 mins) Used when create the Experiment Plan Template.
  • delete - (Defaults to 5 mins) Used when delete the Experiment Plan Template.
  • update - (Defaults to 5 mins) Used when update the Experiment Plan Template.

Import

Eflo Experiment Plan Template can be imported using the id, e.g.

$ terraform import alicloud_eflo_experiment_plan_template.example <id>