A workflow orchestrates tasks in a specific sequence with defined dependencies. To run tasks at a scheduled time, create a workflow and configure its scheduling properties.
Prerequisites
-
You have created a workspace. For more information, see Manage workspaces.
-
You have developed and published the required tasks.
Create a workflow
-
Go to the Workflows page.
-
Log on to the E-MapReduce console.
-
In the left-side navigation pane, choose EMR Serverless > Spark.
-
On the Spark page, click the name of the target workspace.
-
On the EMR Serverless Spark page, click Workflows in the left-side navigation pane.
-
-
On the Workflows page, click Create Workflow.
-
In the Create Workflow panel, configure the following parameters and click Next.
Parameter
Description
Name
The workflow name must be unique within the current workspace.
Resource Queue
Select the default resource queue in which the workflow runs.
NoteA node's resource queue setting overrides this default queue.
Other Settings
Scheduling Type
Specifies how the workflow runs in the production environment. The following scheduling types are supported:
-
Manual (Default): The workflow runs only when you trigger it manually.
-
Scheduler: The workflow runs automatically on a defined schedule (minute, hour, or day).
When you set Scheduling Type to Scheduler, you must also configure the Scheduling Time and Scheduling Started At parameters.
Scheduling Time
Defines how often the workflow automatically runs. This parameter is required only when you set Scheduling Type to Scheduler.
The following scheduling cycles are supported:
-
Daily: Runs once at a specified time each day.
-
Hourly: Runs at a defined interval, such as
every N hours, within a specified time range each day. -
Minute: Runs at a defined interval, such as
every N minutes, within a specified time range each day.
Scheduling Started At
The date and time when the scheduled workflow starts. The default is the current time. This parameter is required only for the Scheduler type.
ImportantFor a workflow of the Scheduler type, you must enable the Scheduling Status toggle on the Workflows page to activate its schedule.
Retries After Failure
Specifies the number of times a node retries after a failure. By default, the node does not retry.
NoteThe settings for an individual node can override this parameter.
Failure Notification
The email address that receives failure notifications.
Tags
The key-value pairs to use as tags for the workflow.
-
-
Edit the workflow nodes.
-
On the Edit Workflow page, click Add Node at the bottom.
-
In the Add Node panel, configure the node parameters.
Parameter
Description
Source File Path
The path to the task that corresponds to this node. The task must be published.
Node Type
The type of the workflow node. By default, the system infers the node type based on the task at the specified path.
Node Name
A custom name for the node. The name is automatically populated based on the task source.
Upstream Node
The preceding node in the workflow.
The first node does not require an upstream node.
Number of Retries
Inherits the retry settings from the workflow. By default, no retries are performed.
Timeout (Seconds)
The timeout for a single node run. By default, there is no limit.
Subscription
You can specify an email address to subscribe to notifications for specific node states.
Tags
The tag key-value pairs for the node. By default, each node automatically includes two built-in tags:
workflow_nameandtask_name.Resource Queue
The resource queue for this node. This setting overrides the workflow's default resource queue.
ImportantA node-specific resource queue setting takes precedence over the workflow's default, even if the default is changed later.
NoteIf the task source is SQL development, you must also configure task parameters. These parameters inherit values from the task template by default. You can modify the task template to change the default values. For more information about parameters, see Configuration management.
-
Click Save.
After configuring the initial node, you can click Add Node at the bottom of the page to add more nodes.
-
-
Publish the workflow.
-
In the upper-right corner, click Publish Workflow.
-
In the Publish dialog box, you can enter a description for the release, and then click OK.
-
Run a workflow
Each time you run a workflow, a workflow run is generated on the Workflow Runs tab of the workflow details page.
Debug
You can debug the latest version of a workflow while editing it.
-
On the Edit Workflow page, click Debug.

-
In the Debug dialog box, select a resource queue from the development environment, and then click Run.
Scheduled run
If you set Scheduling Type to Scheduler when creating the workflow, you can enable the Scheduling Status toggle after creation. The workflow will then be triggered at the specified time.

Manual run
On the Workflows page, click the name of the target workflow. In the upper-right corner, click Run and select a method to run the workflow.
-
Manually Run (Default): Runs the workflow immediately, independent of its schedule.
-
Backfill Data: Reruns the workflow for a historical period, typically to correct data from failed or missed runs. Configure the following parameters:
Parameter
Description
Cycle
The system generates workflow runs for the time range you select.
-
When the current time passes the start of the selected cycle, the system automatically generates and executes the backfill runs.
-
A backfill workflow run is generated and run only if the scheduled run time of the workflow falls within the selected cycle.
-
If the workflow contains time variables (for example,
${ds}), the system automatically replaces them with the time from the selected cycle.
Resource Queue
Uses the workflow's default resource queue. You can also select another available queue in the production environment.
Remarks
A description for the backfill run to aid in future management and troubleshooting.
More Settings
Failure Notification: Specify an email address to receive alerts when the backfill workflow run fails.
-
View run status
You can view the status of all workflow runs and node runs in the Workflow Runs Status and Workflow Node Runs Status columns for the target workflow.
-
Workflow run status
Status
Description
Blue
Running
Green
Succeeded
Red
Failed
Purple
Pending
-
Workflow node run status
Status
Description
Blue
Running
Green
Succeeded
Red
Failed
Yellow
Retrying
Purple
Pending
Related topics
-
For more information about the key concepts of Workflows, see Basic concepts.
-
For information about viewing workflow and node runs, see Manage workflow runs and node runs.