All job development is performed within a workspace, so you must create one first. This topic describes how to create a workspace in the E-MapReduce console.
Prerequisites
You have registered an Alibaba Cloud account and completed real-name verification.
You must have an account with the permissions required to create a workspace:
If you use your Alibaba Cloud account, see Assign roles to an Alibaba Cloud account for authorization details.
If you use a RAM user or a RAM role, ensure that the AliyunEMRServerlessSparkFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess access policies are attached to the RAM user or RAM role. Then, on the Access Control page, add the RAM user or RAM role and grant it the administrator role. For more information, see Grant permissions to a RAM user and Manage users and roles.
Data Lake Formation (DLF) is activated. For more information, see Quick Start. For a list of regions that support DLF, see Regions and endpoints.
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Activate OSS and Create a bucket.
Precautions
You are responsible for managing and configuring the runtime environment for your code.
Create a subscription workspace
Go to the EMR Serverless Spark page.
Log on to the EMR console.
In the left-side navigation pane, choose EMR Serverless > Spark.
In the top navigation bar, select a region based on your requirements.
ImportantYou cannot change the region of a workspace after you create it.
Click Create Workspace.
On the Create Workspace page, configure the following parameters.
Parameter
Description
Example
Region
Select the region where your data is located.
China (Hangzhou)
Billing Method
Select Subscription.
subscription
Workspace Name
The name must be 1 to 64 characters long and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_).
NoteIf you enter a name that is already in use, the system prompts you to provide a different one.
emr-serverless-spark
CU Quota
The maximum number of concurrent compute units (CUs) available for processing jobs in the workspace.
NoteThe maximum CU quota for a workspace is 1,000 CUs. If you need a higher quota, submit a ticket.
1000
Workspace Directory
The directory used to store data files, such as job logs, runtime events, and resources.
Select a bucket with OSS-HDFS enabled for native HDFS interface compatibility. If your use case does not require HDFS, you can select a standard OSS bucket.
NoteYou can specify either a parent directory or a subdirectory for the OSS path based on your needs. Spark automatically creates a directory named after the workspace ID under the specified OSS path to store the following data:
<workspace-ID>/spark/logs/: Spark job logs
<workspace-ID>/spark/eventlogs/: Spark event logs
<workspace-ID>/spark/snapshot/: Spark snapshot data
emr-oss-hdfs
DLF for Metadata Storage
Stores and manages your metadata. You can select a DLF or DLF-Legacy catalog.
If DLF is activated, EMR defaults to a DLF catalog. If only DLF-Legacy is set up, EMR defaults to the DLF-Legacy catalog with an identical name to your UID. To use different data catalogs for different clusters, create a new catalog:
Click Create Catalog. In the dialog box that appears, enter a Catalog Name and click Create Catalog. For more information, see Create a catalog.
From the drop-down list, select the data catalog that you created.
emr-dlf
Execution Role
Specifies the role that Serverless Spark assumes to run jobs. The default role is AliyunEMRSparkJobRunDefaultRole.
Serverless Spark uses this role to access your resources in other services, such as OSS and DLF. If you want to control the permissions of the execution role, you can use a custom execution role. For more information, see Execution role.
AliyunEMRSparkJobRunDefaultRole
(Optional) Advanced Settings
Tags: Tags are identifiers for your cloud resources. You can use tags to classify, search for, and aggregate resources that share the same characteristics, which improves resource management efficiency. You can bind up to 20 tags to each workspace. Each tag consists of a custom tag key and tag value. Tags also enable cost allocation and fine-grained management of pay-as-you-go resources.
You can bind tags when you create a workspace, or add or modify tags at any time on the workspace list page. Binding tags to resources helps you classify them and optimize operations.
For more information about tags, see What is a tag?.
Enter a custom tag key and tag value
Click Create Workspace.
Click Confirm Order and complete the payment.
After the payment is complete, you can see the Workspace being created on the EMR Serverless > Spark page. The workspace is usually created within 3 to 5 minutes.
Create a pay-as-you-go workspace
Go to the EMR Serverless Spark page.
Log on to the EMR console.
In the left-side navigation pane, choose EMR Serverless > Spark.
In the top navigation bar, select a region based on your requirements.
ImportantYou cannot change the region of a workspace after you create it.
Click Create Workspace.
On the Create Workspace page, configure the following parameters.
Parameter
Description
Example
Region
Select the region where your data is located.
China (Hangzhou)
Billing Method
Select Subscription.
pay-as-you-go
Workspace Name
The name must be 1 to 64 characters long and can contain only Chinese characters, letters, digits, hyphens (-), and underscores (_).
NoteIf you enter a name that is already in use, the system prompts you to provide a different one.
emr-serverless-spark
Maximum Quota
Maximum compute units (CUs) available for concurrent job execution in the workspace.
NoteThe maximum burst quota for a workspace is 5,000 CUs. If you need a higher quota, submit a ticket.
1000
Workspace Directory
The directory used to store data files, such as job logs, runtime events, and resources.
Select a bucket with OSS-HDFS enabled for native HDFS interface compatibility. If your use case does not require HDFS, you can select a standard OSS bucket.
NoteYou can specify either a parent directory or a subdirectory for the OSS path based on your needs. Spark automatically creates a directory named after the workspace ID under the specified OSS path to store the following data:
<workspace-ID>/spark/logs/: Spark job logs
<workspace-ID>/spark/eventlogs/: Spark event logs
<workspace-ID>/spark/snapshot/: Spark snapshot data
emr-oss-hdfs
DLF for Metadata Storage
Stores and manages your metadata. You can select a DLF or DLF-Legacy catalog.
If DLF is activated, EMR defaults to a DLF catalog. If only DLF-Legacy is set up, EMR defaults to the DLF-Legacy catalog with an identical name to your UID. To use different data catalogs for different clusters, create a new catalog:
Click Create Catalog. In the dialog box that appears, enter a Catalog Name and click Create Catalog. For more information, see Create a catalog.
From the drop-down list, select the data catalog that you created.
emr-dlf
Execution Role
Specifies the role that Serverless Spark assumes to run jobs. The default role is AliyunEMRSparkJobRunDefaultRole.
Serverless Spark uses this role to access your resources in other services, such as OSS and DLF. If you want to control the permissions of the execution role, you can use a custom execution role. For more information, see Execution role.
AliyunEMRSparkJobRunDefaultRole
(Optional) Advanced Settings
Tags: Tags are identifiers for your cloud resources. You can use tags to classify, search for, and aggregate resources that share the same characteristics, which improves resource management efficiency. You can bind up to 20 tags to each workspace. Each tag consists of a custom tag key and tag value. Tags also enable cost allocation and fine-grained management of pay-as-you-go resources.
You can bind tags when you create a workspace, or add or modify tags at any time on the workspace list page. Binding tags to resources helps you classify them and optimize operations.
For more information about tags, see What is a tag?.
Enter a custom tag key and tag value
Click Create Workspace.
Related documents
After creating a workspace, you can start developing jobs, such as SparkSQL jobs. For more information, see SparkSQL development quick start.