Workflow

更新时间:
复制 MD 格式

A workflow automates data processing by organizing task nodes into a visual DAG with drag-and-drop, establishing dependencies and scheduling to build reliable data pipelines.

What is a workflow?

A workflow is a core orchestration unit in DataWorks. It organizes task nodes (SQL, Shell, Python, data synchronization, Check) into a directed acyclic graph (DAG) with clear dependencies, enabling unified scheduling and execution. Workflows can also be combined to support complex business scenarios.

By integrating separate tasks into a structured process, workflows shift focus from individual task management to managing an entire data pipeline. Core benefits:

  • Visual development flows
    Encapsulate dependent nodes (SQL, Shell) into business-oriented workflows like "Daily Active User Analysis" as a clear DAG. This clarifies the technical path and helps non-technical staff understand data flow, aligning business and technology.

  • Atomic development and O&M
    As the smallest change unit, a workflow supports holistic submission, deployment, and O&M (testing, rerunning, backfill). This prevents production issues from partial modifications and ensures end-to-end consistency.

  • Team collaboration boundaries
    Workflows clarify ownership in multi-team environments—for example, the transaction team owns transaction data. This enables permission isolation, issue tracking, and decoupled upstream/downstream collaboration.

    image

Workflow type comparison

DataWorks recommends two workflow types:

  • Scheduled workflow: Runs automatically on a fixed schedule (hourly, daily, weekly). Triggered by scheduling rules, with node execution controlled by the scheduled time. Suitable for recurring data processing.

  • Event-triggered workflow: Triggered on demand by external signals—manual operations, OpenAPI calls, or event messages. Does not rely on a fixed schedule. Suitable for real-time processing or responding to external events.

Feature

Scheduled workflow

Event-triggered workflow

Manual workflow (not recommended)

Scheduling method

Triggered by scheduled time and dependencies

Manual/Event/API trigger

Manual run

Scenarios

Daily/Hourly/Weekly/Monthly batch

Real-time processing/On-demand execution/External integration

Temporary tasks (for backward compatibility)

Parameter priority

Node > Workflow > Workspace

Node > Workflow > Workspace

Workflow > Node

Typical use case

T+1 reports at midnight daily

Automatic processing upon OSS file arrival

One-time data fix

Important
  • An event-triggered workflow without a bound trigger can serve as a manual workflow, gradually replacing manual workflows.

  • Manually triggered workflows are mainly for compatibility with older data development patterns. Do not use them for new projects.

Quick selection guide

Answer three questions to determine the right workflow type:

image

References

Choose a topic based on your scenario: