Create a workspace

更新时间:
复制 MD 格式

All job development is performed within a workspace, so you must create one first. This topic describes how to create a workspace in the E-MapReduce console.

Prerequisites

  • You have registered an Alibaba Cloud account and completed real-name verification.

  • You must have an account with the permissions required to create a workspace:

    • If you use your Alibaba Cloud account, see Assign roles to an Alibaba Cloud account for authorization details.

    • If you use a RAM user or a RAM role, ensure that the AliyunEMRServerlessSparkFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess access policies are attached to the RAM user or RAM role. Then, on the Access Control page, add the RAM user or RAM role and grant it the administrator role. For more information, see Grant permissions to a RAM user and Manage users and roles.

  • Data Lake Formation (DLF) is activated. For more information, see Quick Start. For a list of regions that support DLF, see Regions and endpoints.

  • Object Storage Service (OSS) is activated and a bucket is created. For more information, see Activate OSS and Create a bucket.

Precautions

You are responsible for managing and configuring the runtime environment for your code.

Create a subscription workspace

  1. Go to the EMR Serverless Spark page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. In the top navigation bar, select a region based on your requirements.

      Important

      You cannot change the region of a workspace after you create it.

  2. Click Create Workspace.

  3. On the Create Workspace page, configure the following parameters.

    Parameter

    Description

    Example

    Region

    Select the region where your data is located.

    China (Hangzhou)

    Billing Method

    Select Subscription.

    subscription

    Workspace Name

    The name must be 1 to 64 characters long and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_).

    Note

    If you enter a name that is already in use, the system prompts you to provide a different one.

    emr-serverless-spark

    CU Quota

    The maximum number of concurrent compute units (CUs) available for processing jobs in the workspace.

    Note

    The maximum CU quota for a workspace is 1,000 CUs. If you need a higher quota, submit a ticket.

    1000

    Workspace Directory

    The directory used to store data files, such as job logs, runtime events, and resources.

    Select a bucket with OSS-HDFS enabled for native HDFS interface compatibility. If your use case does not require HDFS, you can select a standard OSS bucket.

    Note

    You can specify either a parent directory or a subdirectory for the OSS path based on your needs. Spark automatically creates a directory named after the workspace ID under the specified OSS path to store the following data:

    • <workspace-ID>/spark/logs/: Spark job logs

    • <workspace-ID>/spark/eventlogs/: Spark event logs

    • <workspace-ID>/spark/snapshot/: Spark snapshot data

    emr-oss-hdfs

    DLF for Metadata Storage

    Stores and manages your metadata. You can select a DLF or DLF-Legacy catalog.

    If DLF is activated, EMR defaults to a DLF catalog. If only DLF-Legacy is set up, EMR defaults to the DLF-Legacy catalog with an identical name to your UID. To use different data catalogs for different clusters, create a new catalog:

    1. Click Create Catalog. In the dialog box that appears, enter a Catalog Name and click Create Catalog. For more information, see Create a catalog.

    2. From the drop-down list, select the data catalog that you created.

    emr-dlf

    Execution Role

    Specifies the role that Serverless Spark assumes to run jobs. The default role is AliyunEMRSparkJobRunDefaultRole.

    Serverless Spark uses this role to access your resources in other services, such as OSS and DLF. If you want to control the permissions of the execution role, you can use a custom execution role. For more information, see Execution role.

    AliyunEMRSparkJobRunDefaultRole

    (Optional) Advanced Settings

    Tags: Tags are identifiers for your cloud resources. You can use tags to classify, search for, and aggregate resources that share the same characteristics, which improves resource management efficiency. You can bind up to 20 tags to each workspace. Each tag consists of a custom tag key and tag value. Tags also enable cost allocation and fine-grained management of pay-as-you-go resources.

    You can bind tags when you create a workspace, or add or modify tags at any time on the workspace list page. Binding tags to resources helps you classify them and optimize operations.

    For more information about tags, see What is a tag?.

    Enter a custom tag key and tag value

  4. Click Create Workspace.

  5. Click Confirm Order and complete the payment.

    After the payment is complete, you can see the Workspace being created on the EMR Serverless > Spark page. The workspace is usually created within 3 to 5 minutes.

Create a pay-as-you-go workspace

  1. Go to the EMR Serverless Spark page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. In the top navigation bar, select a region based on your requirements.

      Important

      You cannot change the region of a workspace after you create it.

  2. Click Create Workspace.

  3. On the Create Workspace page, configure the following parameters.

    Parameter

    Description

    Example

    Region

    Select the region where your data is located.

    China (Hangzhou)

    Billing Method

    Select Subscription.

    pay-as-you-go

    Workspace Name

    The name must be 1 to 64 characters long and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_).

    Note

    If you enter a name that is already in use, the system prompts you to provide a different one.

    emr-serverless-spark

    Maximum Quota

    Maximum compute units (CUs) available for concurrent job execution in the workspace.

    Note

    The maximum burst quota for a workspace is 5,000 CUs. If you need a higher quota, submit a ticket.

    1000

    Workspace Directory

    The directory used to store data files, such as job logs, runtime events, and resources.

    Select a bucket with OSS-HDFS enabled for native HDFS interface compatibility. If your use case does not require HDFS, you can select a standard OSS bucket.

    Note

    You can specify either a parent directory or a subdirectory for the OSS path based on your needs. Spark automatically creates a directory named after the workspace ID under the specified OSS path to store the following data:

    • <workspace-ID>/spark/logs/: Spark job logs

    • <workspace-ID>/spark/eventlogs/: Spark event logs

    • <workspace-ID>/spark/snapshot/: Spark snapshot data

    emr-oss-hdfs

    DLF for Metadata Storage

    Stores and manages your metadata. You can select a DLF or DLF-Legacy catalog.

    If DLF is activated, EMR defaults to a DLF catalog. If only DLF-Legacy is set up, EMR defaults to the DLF-Legacy catalog with an identical name to your UID. To use different data catalogs for different clusters, create a new catalog:

    1. Click Create Catalog. In the dialog box that appears, enter a Catalog Name and click Create Catalog. For more information, see Create a catalog.

    2. From the drop-down list, select the data catalog that you created.

    emr-dlf

    Execution Role

    Specifies the role that Serverless Spark assumes to run jobs. The default role is AliyunEMRSparkJobRunDefaultRole.

    Serverless Spark uses this role to access your resources in other services, such as OSS and DLF. If you want to control the permissions of the execution role, you can use a custom execution role. For more information, see Execution role.

    AliyunEMRSparkJobRunDefaultRole

    (Optional) Advanced Settings

    Tags: Tags are identifiers for your cloud resources. You can use tags to classify, search for, and aggregate resources that share the same characteristics, which improves resource management efficiency. You can bind up to 20 tags to each workspace. Each tag consists of a custom tag key and tag value. Tags also enable cost allocation and fine-grained management of pay-as-you-go resources.

    You can bind tags when you create a workspace, or add or modify tags at any time on the workspace list page. Binding tags to resources helps you classify them and optimize operations.

    For more information about tags, see What is a tag?.

    Enter a custom tag key and tag value

  4. Click Create Workspace.

Related documents

After creating a workspace, you can start developing jobs, such as SparkSQL jobs. For more information, see SparkSQL development quick start.