Configure cluster identity mapping

更新时间:
复制 MD 格式

This topic describes how to manually configure the mapping between the cloud accounts of DataWorks members and specified accounts in an EMR cluster. This allows DataWorks members to run tasks using the mapped cluster identity.

Precautions

  • An identity mapping for an EMR cluster applies to all workspaces where that cluster is registered. Modify the EMR cluster configuration with caution.

  • If you do not manually configure an account mapping, DataWorks uses a default policy to submit tasks to the EMR cluster, regardless of which cloud account you use to run the tasks:

    • If you run tasks as a RAM user: By default, DataWorks runs the task by using an EMR cluster system account that has the same name as the RAM user. If the cluster does not have LDAP or Kerberos authentication enabled, you must configure a mapping as described in this topic. Otherwise, the DataWorks task will fail.

    • If you run tasks as an Alibaba Cloud account: You must manually configure an account mapping for the Alibaba Cloud account, regardless of whether the cluster has LDAP or Kerberos authentication enabled. Otherwise, all tasks run using the Alibaba Cloud account will fail.

      Note

      The account used to access an EMR cluster from DataWorks depends on the access identity that you specify when you register the EMR cluster.

      • To run tasks as a RAM user, select Cluster Account Mapped to Account of Task Owner or Cluster Account Mapped to RAM User as the default access identity when you bind an EMR computing resource in DataWorks.

      • To run tasks as an Alibaba Cloud account, select Cluster Account Mapped to Alibaba Cloud Account as the default access identity when you bind an EMR computing resource in DataWorks.

Usage notes

  • Authentication method

    DataWorks does not support configuring both LDAP and Kerberos account mappings at the same time. Therefore, tasks run in DataWorks will fail if an EMR component has both LDAP and Kerberos authentication enabled.

  • Whitelist configuration

    If Ranger authentication is enabled on the cluster, you must add DataWorks to the whitelist to ensure that DataWorks can access the EMR cluster. For more information, see Appendix: Add DataWorks to the whitelist.

  • User management

    If you use a non-system account for cluster identity authentication (for example, Kerberos), you must first enable the corresponding authentication service on the cluster, and then add the account used for EMR task development in DataWorks to that service. For more information, see Configure Kerberos authentication.

  • Data permissions

    You can use permission management components on the EMR cluster to isolate data permissions for different DataWorks users. For example, you can use the Ranger component to manage the permissions of the cluster user mapped to an Alibaba Cloud account. For more information, see Ranger.

    If an EMR cluster uses Data Lake Formation (DLF) as its metadata service and has the DLF-Auth component enabled to manage permissions, you can apply for data permissions in the DataWorks Security Center. For more information, see DLF data access control.

  • Mapping configuration

    Note that DataWorks tasks may fail in the following scenarios:

    Scenario

    Description

    Using system account mapping in DataWorks

    • A task is run by a RAM user, but no cluster account with the same name exists.

    • A task is run by a RAM user and a mapping is manually configured between a DataWorks workspace member and a cluster account, but the mapped cluster account or password is incorrect.

    • A task is run by an Alibaba Cloud account, but no mapping is configured for the account.

    Using LDAP/Kerberos account mapping in DataWorks

    • LDAP or Kerberos authentication is enabled on the cluster, but a corresponding account mapping is missing or incorrect in DataWorks.

    • You select Kerberos mapping in DataWorks, but Kerberos is not enabled on the cluster.

    • You select LDAP mapping in DataWorks, but the authentication service is not enabled for the component on the cluster.

    Note

    After you configure LDAP mapping in DataWorks, SQL tasks such as Hive, Impala, Presto, and Trino use the mapped account for authentication by default. If LDAP authentication is not enabled on the cluster component, these tasks will fail.

Limitations

  • Only users with the following roles can configure identity mappings for workspace members:

    • Alibaba Cloud account

    • A RAM user or RAM role with AliyunDataWorksFullAccess and AliyunEMRFullAccess permissions.

    • A RAM user or RAM role assigned the Workspace Administrator role and has AliyunEMRFullAccess permissions.

  • Users without these roles can configure an identity mapping only for themselves.

Go to the account mappings page

  1. Log on to the DataWorks console. In the target region, click More > Management Center in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane, click Computing Resources to go to the Computing Resources page.

  3. In the list of computing resources, find the target EMR cluster. On the Account Mappings tab, click Edit Account Mappings in the upper-right corner to go to the Edit Account Mappings page.

    The Account Mappings tab contains the Production and development environment section, which displays the Mapping Type and Number of Mapped Accounts fields.

Configure cluster account mapping

On the Edit Account Mappings page, follow these steps to configure a cluster identity mapping.

  1. Upload a configuration file.

    1. If Kerberos authentication is enabled on the cluster, you must download the authentication credentials from the cluster.

    2. Click Upload Keystore File to upload the downloaded authentication credentials. This ensures that EMR Trino and EMR Presto tasks run as expected.

  1. Configure the mapping.

    • Configuration Mode: You can define a mapping for the current cluster, or select Reference Configurations of Another Cluster to reuse an existing mapping configuration.

    • Mapping Type: Specifies the account type for cluster authentication. Supported types include System Account Mapping, OPEN LDAP Account Mapping, and Kerberos Account Mapping.

    Note
    • If you select Mapping to Kerberos Account as the Mapping Type, you must upload a Keystore file.

    • Before you use Kerberos account mapping, make sure that the Kerberos service is enabled on the cluster. For more information, see Enable Kerberos.

    • Before you use LDAP account mapping, make sure that the LDAP service is enabled on the related components. After you configure LDAP mapping in DataWorks, SQL tasks such as Hive, Impala, Presto, and Trino use the mapped account for authentication by default. If LDAP authentication is not enabled on the cluster component, these tasks will fail.

  2. Click Confirm to save the cluster account mapping settings.

Appendix: Add DataWorks to the whitelist

If Ranger is enabled on the EMR cluster, you must add DataWorks to the whitelist in EMR and restart Hive before you develop EMR jobs in DataWorks. Otherwise, jobs will fail at runtime with a Cannot modify spark.yarn.queue at runtime or Cannot modify SKYNET_BIZDATE at runtime error. For more information about Ranger, see Ranger.

  1. Configure the whitelist.

    You can configure the whitelist by adding a key and value for a custom parameter in EMR. The following example shows the configuration for the Hive component.

    hive.security.authorization.sqlstd.confwhitelist.append=tez.*|spark.*|mapred.*|mapreduce.*|ALISA.*|SKYNET.*
    Note

    The ALISA.* and SKYNET.* settings are specific to DataWorks.

  1. Restart the service.

    After you configure the whitelist, you must restart the service for the changes to take effect. For more information about how to restart a service, see Restart a service.