DataWorks Serverless Resource Groups use a multi-zone deployment architecture by default, spanning at least two Availability Zones within the same city. When a zone fails, the system automatically reschedules tasks to other active zones — keeping your data development workloads running without manual intervention.
How it works
Serverless Resource Groups support multi-zone deployment by default for both pay-as-you-go and subscription billing. Resources are distributed across at least two Availability Zones in the same city. When a zone becomes unavailable, computing resources in that zone are automatically redirected to the remaining active zones through a failover process.
The following diagram illustrates the multi-zone architecture and how tasks are redistributed during a zone failure.
Core concepts
|
Concept |
Description |
|
High availability |
Serverless Resource Groups are deployed across multiple Availability Zones. If one zone fails, tasks are automatically rescheduled to run in other zones, ensuring business continuity. |
|
Failover |
The process by which the system automatically reschedules tasks from a failed zone to other active zones. Failover is triggered when computing resources or services in the original zone become unavailable. |
|
Resource availability ratio |
The percentage of computing units (CUs) in a resource group that are available for tasks at a given time. A single-zone failure reduces the overall resource pool, which lowers the resource availability ratio. |
Limitations
High availability ensures task scheduling continuity — not unlimited resources or a 100% task success rate.
During a single-zone failure, the following situations may occur:
-
Reduced resource availability. When a zone fails, the overall CU pool of the resource group shrinks. Tasks may queue while waiting for available resources in the remaining zones.
-
Task failure and retry. Tasks running in the failed zone will fail. The system attempts to reschedule them in other zones through failover. For this to succeed, tasks must be rerunnable and must have an automatic retry policy configured.
-
External dependency requirements. If a task depends on an external system — such as a database or Message Queue (MQ) service — that system must also support high availability. A DataWorks failover cannot recover a task that cannot reach an unavailable external dependency.
Unsupported scenarios
The following DataWorks resource group use cases do not support high availability by default:
|
Scenario |
High Availability Support |
|
Personal developer environments |
Not supported by default |
|
|
Not supported by default |
|
|
Not supported by default |
Configure your environment for effective failover
High availability is a shared responsibility: DataWorks handles zone-level redundancy automatically, but your workloads must be configured to take full advantage of it. Complete the following steps to ensure your tasks recover reliably from a zone failure.
-
Configure an automatic retry policy for tasks. Tasks running in a failed zone are restarted in another zone. Without a retry policy, those task instances fail permanently. Configure automatic retry in your task settings.
-
Ensure external dependencies support high availability. If your tasks depend on external systems — such as databases or Message Queue (MQ) services — those systems must also be configured for high availability. A DataWorks failover cannot recover a task that cannot reach an unavailable external dependency.