Manage workflows

更新时间:
复制 MD 格式

A workflow orchestrates tasks in a specific sequence with defined dependencies. To run tasks at a scheduled time, create a workflow and configure its scheduling properties.

Prerequisites

  • You have created a workspace. For more information, see Manage workspaces.

  • You have developed and published the required tasks.

Create a workflow

  1. Go to the Workflows page.

    1. Log on to the E-MapReduce console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, click Workflows in the left-side navigation pane.

  2. On the Workflows page, click Create Workflow.

  3. In the Create Workflow panel, configure the following parameters and click Next.

    Parameter

    Description

    Name

    The workflow name must be unique within the current workspace.

    Resource Queue

    Select the default resource queue in which the workflow runs.

    Note

    A node's resource queue setting overrides this default queue.

    Other Settings

    Scheduling Type

    Specifies how the workflow runs in the production environment. The following scheduling types are supported:

    • Manual (Default): The workflow runs only when you trigger it manually.

    • Scheduler: The workflow runs automatically on a defined schedule (minute, hour, or day).

      When you set Scheduling Type to Scheduler, you must also configure the Scheduling Time and Scheduling Started At parameters.

    Scheduling Time

    Defines how often the workflow automatically runs. This parameter is required only when you set Scheduling Type to Scheduler.

    The following scheduling cycles are supported:

    • Daily: Runs once at a specified time each day.

    • Hourly: Runs at a defined interval, such as every N hours, within a specified time range each day.

    • Minute: Runs at a defined interval, such as every N minutes, within a specified time range each day.

    Scheduling Started At

    The date and time when the scheduled workflow starts. The default is the current time. This parameter is required only for the Scheduler type.

    Important

    For a workflow of the Scheduler type, you must enable the Scheduling Status toggle on the Workflows page to activate its schedule.

    Retries After Failure

    Specifies the number of times a node retries after a failure. By default, the node does not retry.

    Note

    The settings for an individual node can override this parameter.

    Failure Notification

    The email address that receives failure notifications.

    Tags

    The key-value pairs to use as tags for the workflow.

  4. Edit the workflow nodes.

    1. On the Edit Workflow page, click Add Node at the bottom.

    2. In the Add Node panel, configure the node parameters.

      Parameter

      Description

      Source File Path

      The path to the task that corresponds to this node. The task must be published.

      Node Type

      The type of the workflow node. By default, the system infers the node type based on the task at the specified path.

      Node Name

      A custom name for the node. The name is automatically populated based on the task source.

      Upstream Node

      The preceding node in the workflow.

      The first node does not require an upstream node.

      Number of Retries

      Inherits the retry settings from the workflow. By default, no retries are performed.

      Timeout (Seconds)

      The timeout for a single node run. By default, there is no limit.

      Subscription

      You can specify an email address to subscribe to notifications for specific node states.

      Tags

      The tag key-value pairs for the node. By default, each node automatically includes two built-in tags: workflow_name and task_name.

      Resource Queue

      The resource queue for this node. This setting overrides the workflow's default resource queue.

      Important

      A node-specific resource queue setting takes precedence over the workflow's default, even if the default is changed later.

      Note

      If the task source is SQL development, you must also configure task parameters. These parameters inherit values from the task template by default. You can modify the task template to change the default values. For more information about parameters, see Configuration management.

    3. Click Save.

      After configuring the initial node, you can click Add Node at the bottom of the page to add more nodes.

  5. Publish the workflow.

    1. In the upper-right corner, click Publish Workflow.

    2. In the Publish dialog box, you can enter a description for the release, and then click OK.

Run a workflow

Each time you run a workflow, a workflow run is generated on the Workflow Runs tab of the workflow details page.

Debug

You can debug the latest version of a workflow while editing it.

  1. On the Edit Workflow page, click Debug.

    image

  2. In the Debug dialog box, select a resource queue from the development environment, and then click Run.

Scheduled run

If you set Scheduling Type to Scheduler when creating the workflow, you can enable the Scheduling Status toggle after creation. The workflow will then be triggered at the specified time.

image.png

Manual run

On the Workflows page, click the name of the target workflow. In the upper-right corner, click Run and select a method to run the workflow.

  • Manually Run (Default): Runs the workflow immediately, independent of its schedule.

  • Backfill Data: Reruns the workflow for a historical period, typically to correct data from failed or missed runs. Configure the following parameters:

    Parameter

    Description

    Cycle

    The system generates workflow runs for the time range you select.

    • When the current time passes the start of the selected cycle, the system automatically generates and executes the backfill runs.

    • A backfill workflow run is generated and run only if the scheduled run time of the workflow falls within the selected cycle.

    • If the workflow contains time variables (for example, ${ds}), the system automatically replaces them with the time from the selected cycle.

    Resource Queue

    Uses the workflow's default resource queue. You can also select another available queue in the production environment.

    Remarks

    A description for the backfill run to aid in future management and troubleshooting.

    More Settings

    Failure Notification: Specify an email address to receive alerts when the backfill workflow run fails.

View run status

You can view the status of all workflow runs and node runs in the Workflow Runs Status and Workflow Node Runs Status columns for the target workflow.image.png

  • Workflow run status

    Status

    Description

    Blue

    Running

    Green

    Succeeded

    Red

    Failed

    Purple

    Pending

  • Workflow node run status

    Status

    Description

    Blue

    Running

    Green

    Succeeded

    Red

    Failed

    Yellow

    Retrying

    Purple

    Pending

Related topics