A workflow automates data processing by organizing task nodes into a visual DAG with drag-and-drop, establishing dependencies and scheduling to build reliable data pipelines.
What is a workflow?
A workflow is a core orchestration unit in DataWorks. It organizes task nodes (SQL, Shell, Python, data synchronization, Check) into a directed acyclic graph (DAG) with clear dependencies, enabling unified scheduling and execution. Workflows can also be combined to support complex business scenarios.
By integrating separate tasks into a structured process, workflows shift focus from individual task management to managing an entire data pipeline. Core benefits:
-
Visual development flows
Encapsulate dependent nodes (SQL, Shell) into business-oriented workflows like "Daily Active User Analysis" as a clear DAG. This clarifies the technical path and helps non-technical staff understand data flow, aligning business and technology. -
Atomic development and O&M
As the smallest change unit, a workflow supports holistic submission, deployment, and O&M (testing, rerunning, backfill). This prevents production issues from partial modifications and ensures end-to-end consistency. -
Team collaboration boundaries
Workflows clarify ownership in multi-team environments—for example, the transaction team owns transaction data. This enables permission isolation, issue tracking, and decoupled upstream/downstream collaboration.
Workflow type comparison
DataWorks recommends two workflow types:
-
Scheduled workflow: Runs automatically on a fixed schedule (hourly, daily, weekly). Triggered by scheduling rules, with node execution controlled by the scheduled time. Suitable for recurring data processing.
-
Event-triggered workflow: Triggered on demand by external signals—manual operations, OpenAPI calls, or event messages. Does not rely on a fixed schedule. Suitable for real-time processing or responding to external events.
|
Feature |
Scheduled workflow |
Event-triggered workflow |
Manual workflow (not recommended) |
|
Scheduling method |
Triggered by scheduled time and dependencies |
Manual/Event/API trigger |
Manual run |
|
Scenarios |
Daily/Hourly/Weekly/Monthly batch |
Real-time processing/On-demand execution/External integration |
Temporary tasks (for backward compatibility) |
|
Parameter priority |
Node > Workflow > Workspace |
Node > Workflow > Workspace |
Workflow > Node |
|
Typical use case |
T+1 reports at midnight daily |
Automatic processing upon OSS file arrival |
One-time data fix |
-
An event-triggered workflow without a bound trigger can serve as a manual workflow, gradually replacing manual workflows.
-
Manually triggered workflows are mainly for compatibility with older data development patterns. Do not use them for new projects.
Quick selection guide
Answer three questions to determine the right workflow type:
References
Choose a topic based on your scenario:
-
For scheduled scheduling scenarios, see Scheduled workflow orchestration.
-
For event-driven scenarios, see Event-triggered workflows.
-
For O&M and monitoring, see Overview.