Configure a global YARN resource queue

更新时间:
复制 MD 格式

In DataWorks, you can specify a default YARN resource queue at the workspace level for modules to use when running E-MapReduce (EMR) tasks. You can also configure these global settings to override module-specific queue configurations. This topic describes how to configure a global YARN resource queue.

Background information

Yet Another Resource Negotiator (YARN) is a distributed resource management system and a core component of Hadoop. In a Hadoop cluster, it manages resources, schedules jobs, and monitors tasks. For more information about YARN in E-MapReduce, see YARN schedulers.

In DataWorks, you can configure the YARN resource queue for scheduled tasks in one of the following ways:

  • Method 1: Configure a global YARN resource queue

    Set a workspace-level YARN resource queue for a DataWorks module to use for EMR tasks, and specify whether this global configuration overrides module-specific settings. For more information, see the Configure a global YARN resource queue section in this topic.

  • Method 2: Configure a YARN queue within a product module

    • Data Studio: For Hive and Spark nodes, you can specify the YARN resource queue for a single node task by setting the queue parameter in the DataWorks parameters in the Scheduling Settings section on the right side of the node editing page.

    • Data Quality: In partition monitoring rules for an E-MapReduce table, you can specify the YARN Resource Queue by using the Queue setting. For more information, see Configure monitoring rules for a single table.

    • Other product modules: You cannot configure a separate YARN Resource Queue within other modules.

Limits

  • Only the following accounts and roles can configure YARN resource queues:

    • An Alibaba Cloud account.

    • A RAM user or RAM role with the AliyunDataWorksFullAccess policy.

    • A RAM user with the Workspace Administrator role.

  • Modify the maximum application priority for YARN.

    When you modify the YARN priority for E-MapReduce tasks in DataWorks, you must also add the yarn.cluster.max-application-priority configuration item to the yarn-site.xml file in your E-MapReduce cluster and set its value to a number greater than the default of 0. Otherwise, the priority you configure in DataWorks will not take effect.

    Note

    You must restart the YARN service for the changes to take effect.

  • You can configure global YARN resource queues only for Data Studio, Data Quality, DataAnalysis, and Operation Center.

Prerequisites

You must associate an E-MapReduce cluster with your DataWorks workspace. For more information, see Bind an E-MapReduce compute engine.

Configure a global YARN resource queue

  1. Go to the global YARN resource queue configuration page.

    1. Log on to the DataWorks console. In the target region, click More > Management Center in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane, click Computing Resources.

    3. Find the target E-MapReduce cluster and click the YARN Resource Queue tab.

      The Production and development environments section on this page lists the Resource Queue and Global Settings Take Precedence settings for modules such as Data Studio, Data Quality, DataAnalysis, and Operation Center. To change the configuration, click Edit YARN Resource Queues in the upper-right corner.

  2. Configure the global YARN resource queue.

    On the YARN Resource Queues page, click Edit YARN Resource Queues in the upper-right corner to configure the global YARN resource queue and priority for each module.

    Note

    These settings apply globally to the entire workspace. Before you proceed, make sure that you have selected the correct workspace.

    Parameter

    Description

    Resource Queue

    The YARN resource queue used to run E-MapReduce tasks for each module. You can find available resource queues in the EMR on ECS console.

    Global Settings Take Precedence

    If you select this option, the global configurations override any module-specific settings. All tasks will then use the globally configured YARN resource queue.

    • Global configuration: The YARN resource queue configured on the YARN Resource Queue tab for the E-MapReduce cluster. You can find this tab by navigating to Management Center > Computing Resources.

      Note

      You can configure global YARN resource queues only for Data Studio, Data Quality, DataAnalysis, and Operation Center.

    • Module-specific configuration:

      • Data Studio: For Hive and Spark nodes, you can specify the YARN resource queue for a single node task by setting the queue parameter in the DataWorks parameters in the Scheduling Settings section on the right side of the node editing page.

      • Data Quality: In partition monitoring rules for an E-MapReduce table, you can specify the YARN Resource Queue by using the Queue setting. For more information, see Configure monitoring rules for a single table.

      • Other product modules: You cannot configure a separate YARN Resource Queue within other modules.

Related documentation

Map baseline priorities to YARN queue priorities