Execution roles

更新时间:
复制 MD 格式

Jobs in an EMR Serverless Spark workspace use an execution role to authenticate with and access other Alibaba Cloud services, such as Object Storage Service (OSS) and Data Lake Formation (DLF). When you create a workspace, you can use either the default execution role or a custom role.

Use cases

During job execution, the execution role calls different resources or services and assumes a specified identity to authenticate and audit its actions. The main use cases are as follows:

  • Access OSS files

    During job execution, the execution role accesses and operates on files stored in Object Storage Service (OSS).

  • Read and write DLF metadata

    If permission management is enabled for Data Lake Formation (DLF), the execution role assumes different identities based on the job type:

    • Data development: Assumes the identity of the Alibaba Cloud account or RAM user (sub-account) that submits the job.

    • Livy Gateway: Assumes the identity of the token creator.

  • Read and write MaxCompute data

    When reading and writing MaxCompute data, the execution role assumes the identity of the Alibaba Cloud account or RAM user (sub-account) that submitted the job to authenticate and audit its actions.

Usage notes

The execution role cannot be changed after the workspace is created.

Default execution role

When you create a workspace, if you do not modify the Execution Role setting, the system automatically uses the default execution role AliyunEMRSparkJobRunDefaultRole.

The default execution role has the following properties:

  • Role name: AliyunEMRSparkJobRunDefaultRole.

  • Associated policy: This role includes the system access policy AliyunEMRSparkJobRunDefaultRolePolicy, which grants access to Object Storage Service (OSS), Data Lake Formation (DLF), and MaxCompute.

  • Maintenance: Alibaba Cloud creates and maintains this policy, automatically updating it to reflect changes in service requirements.

Important

Do not edit or delete the default execution role AliyunEMRSparkJobRunDefaultRole. Doing so may cause workspace resource creation or job execution to fail.

Custom execution role

To customize permissions for the execution role, select a custom role in the Execution Role setting when creating a workspace. The following steps show how to configure a custom role for password-free access to other resources, such as OSS and DLF, within the same Alibaba Cloud account.

Note

The following access policy is an example of how to configure permissions for a custom role. When you use a custom execution role, note that its access policy is static and does not automatically update with changes in Alibaba Cloud service requirements. To ensure that jobs run correctly, we recommend that you periodically review and update the access policy. Refer to the access policy (AliyunEMRSparkJobRunDefaultRolePolicy) of the default execution role AliyunEMRSparkJobRunDefaultRole for the latest list of required permissions.

Procedure

  1. Create an access policy.

    1. Navigate to the policy creation page.

      1. Log on to the RAM console as a RAM administrator.

      2. In the left-side navigation pane, choose Permissions > Policies.

      3. On the Policies page, click Create Policy.

        image

    2. On the Create Policy page, click the Json tab.

    3. Enter the policy content, and then click OK.

      {
        "Version": "1",
        "Statement": [
          {
            "Action": [
              "oss:ListBuckets",
              "oss:PutObject",
              "oss:ListObjectsV2",
              "oss:ListObjects",
              "oss:GetObject",
              "oss:CopyObject",
              "oss:DeleteObject",
              "oss:DeleteObjects",
              "oss:RestoreObject",
              "oss:CompleteMultipartUpload",
              "oss:ListMultipartUploads",
              "oss:AbortMultipartUpload",
              "oss:UploadPartCopy",
              "oss:UploadPart",
              "oss:GetBucketInfo",
              "oss:PostDataLakeStorageFileOperation",
              "oss:PostDataLakeStorageAdminOperation",
              "oss:GetBucketVersions",
              "oss:ListObjectVersions",
              "oss:DeleteObjectVersion"
            ],
            "Resource": [
              "acs:oss:*:*:serverless-spark-test-resources/*",
              "acs:oss:*:*:serverless-spark-test-resources"
            ],
            "Effect": "Allow"
          },
          {
            "Action": [
              "dlf:AlterDatabase",
              "dlf:AlterTable",
              "dlf:ListCatalogs",
              "dlf:ListDatabases",
              "dlf:ListFunctions",
              "dlf:ListFunctionNames",
              "dlf:ListTables",
              "dlf:ListTableNames",
              "dlf:ListIcebergNamespaceDetails",
              "dlf:ListIcebergTableDetails",
              "dlf:ListIcebergSnapshots",
              "dlf:CreateDatabase",
              "dlf:Get*",
              "dlf:DeleteDatabase",
              "dlf:DropDatabase",
              "dlf:DropTable",
              "dlf:CreateTable",
              "dlf:CommitTable",
              "dlf:UpdateTable",
              "dlf:DeleteTable",
              "dlf:ListPartitions",
              "dlf:ListPartitionNames",
              "dlf:CreatePartition",
              "dlf:BatchCreatePartitions",
              "dlf:UpdateTableColumnStatistics",
              "dlf:DeleteTableColumnStatistics",
              "dlf:UpdatePartitionColumnStatistics",
              "dlf:DeletePartitionColumnStatistics",
              "dlf:UpdateDatabase",
              "dlf:BatchCreateTables",
              "dlf:BatchDeleteTables",
              "dlf:BatchUpdateTables",
              "dlf:BatchGetTables",
              "dlf:BatchUpdatePartitions",
              "dlf:BatchDeletePartitions",
              "dlf:BatchGetPartitions",
              "dlf:DeletePartition",
              "dlf:CreateFunction",
              "dlf:DeleteFunction",
              "dlf:UpdateFunction",
              "dlf:ListPartitionsByFilter",
              "dlf:DeltaGetPermissions",
              "dlf:UpdateCatalogSettings",
              "dlf:CreateLock",
              "dlf:UnLock",
              "dlf:AbortLock",
              "dlf:RefreshLock",
              "dlf:ListTableVersions",
              "dlf:CheckPermissions",
              "dlf:RenameTable",
              "dlf:RollbackTable"
            ],
            "Resource": "*",
            "Effect": "Allow"
          },
          {
            "Action": [
              "dlf-dss:CreateDatabase",
              "dlf-dss:CreateFunction",
              "dlf-dss:CreateTable",
              "dlf-dss:DropDatabase",
              "dlf-dss:DropFunction",
              "dlf-dss:DropTable",
              "dlf-dss:DescribeCatalog",
              "dlf-dss:DescribeDatabase",
              "dlf-dss:DescribeFunction",
              "dlf-dss:DescribeTable",
              "dlf-dss:AlterDatabase",
              "dlf-dss:AlterFunction",
              "dlf-dss:AlterTable",
              "dlf-dss:ListCatalogs",
              "dlf-dss:ListDatabases",
              "dlf-dss:ListTables",
              "dlf-dss:ListFunctions",
              "dlf-dss:CheckPermissions"
            ],
            "Resource": "*",
            "Effect": "Allow"
          },
          {
            "Effect": "Allow",
            "Action": "dlf-auth:ActOnBehalfOfAnotherUser",
            "Resource": "*"
          }
        ]
      }
    4. Enter a Name, such as test-serverless-spark, and a Remarks, and then click OK.

      This policy contains the following elements:

      • Action: Specifies the operation on a specific resource. This example grants permissions to read data from and query directories in OSS and DLF.

      • Resource: Specifies the authorized objects. This example grants access to all DLF objects and all content in the OSS bucket named serverless-spark-test-resources. You must replace serverless-spark-test-resources with the name of your OSS bucket.

      For more information about the basic elements of an access policy, see Basic elements of an access policy.

  2. Create a RAM role.

    1. In the left-side navigation pane, choose Identities > Role.

    2. On the Role page, click Create Role.

    3. Create the RAM role.

      1. In the Create Role panel, configure the following parameters and click OK.

        Parameter

        Description

        Select Trusted Entity

        Select Alibaba Cloud Service.

        Trusted Service

        spark.emr-serverless.aliyuncs.com

      2. Enter a Role Name, such as test-serverless-spark-jobrun, and then click OK.

  3. Grant permissions to the RAM role.

    1. On the Role page, click Add Authorization in the Actions column for the role you created.

    2. In the Add Authorization panel, select Custom policy and add the access policy that you created in the previous step.

    3. Click OK.

    4. Click Close.

  4. Create a workspace and access external resources.

    1. Log on to the E-MapReduce console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. Click Create Workspace and configure the following parameters. For more information about other parameters, see Create a workspace.

      • Workspace Directory: Select an OSS path for which the RAM role you created has read and write permissions.

      • Execution Role: Select the RAM role that you created, such as test-serverless-spark-jobrun.

    4. After the workspace is created, run a batch job as described in Quick start for JAR development to verify the permissions.

      • If you upload a file to the authorized OSS bucket, the job runs as expected.

      • If you upload a file to an unauthorized OSS bucket, the job fails with an access denied error for the OSS path.

Other policy examples

MaxCompute data access

Add the following access policy to the execution role.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "odps:ActOnBehalfOfAnotherUser",
      "Resource": [
        "acs:odps:*:*:users/default/aliyun/*",
        "acs:odps:*:*:users/default/ramuser/*",
        "acs:odps:*:*:users/default/ramrole/*"
      ]
    }
  ]
}

KMS-encrypted OSS access

Add the following access policy to the execution role.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kms:List*",
        "kms:DescribeKey",
        "kms:GenerateDataKey",
        "kms:Decrypt"
      ],
      "Resource": "*"
    }
  ]
}

Related documentation