To run an offline task on a recurring schedule, define its scheduling properties, including the scheduling cycle, dependencies, and parameters.
Important Notes
-
The system supports scheduling configuration only for offline computing tasks with the scheduling type set to auto triggered task.
-
A dependency defines the execution order between two nodes. The status of an upstream node affects the execution status of downstream nodes.
-
When you configure dependencies, the system schedules downstream nodes as follows. First, it waits until the upstream node completes successfully. Then, it checks whether the scheduled time for the downstream node has been reached.
-
If you submit scheduling configuration before the scheduled time, the configuration takes effect after that time. If you configure dependencies after the scheduled time, the system creates instances one day later.
-
Scheduling configuration defines only the properties used when the task runs on schedule. To apply this configuration, publish the task to the production environment.
-
The scheduled time defines the expected execution time. The actual execution time depends on upstream node status. For details about task execution conditions, see Instance Run Diagnostics.
Access Offline Task Properties
-
On the Dataphin homepage, in the top menu bar, click Develop > Data Development.
-
On the Develop page, in the top menu bar, click Project.
-
In the navigation pane on the left, select Data Processing > Compute Job. In the Compute Job list, click the target job name.
-
On the task tab, click Property on the right to open the Property panel.
Configure Offline Task Properties
The following table describes the basic information and scheduling properties available for offline tasks.
|
Configuration Item |
Description |
|
Includes task name, ID, node type, development owner, O&M owner, and description.
|
|
|
CPU and memory resources assigned to run the task. Note
This setting applies only to Python, Shell, Spark on MaxCompute, Spark on Yarn, MapReduce on MaxCompute, and MapReduce on Yarn tasks. |
|
|
Python Third-Party Packages |
Select the Python third-party packages to import. Note
|
|
Define parameters used during node scheduling. Dataphin provides built-in and custom parameters for dynamic value assignment at runtime. Note
If you define variables in your node code, assign values to them here. If no variables are defined, skip this step. |
|
|
Define how the task runs on a recurring schedule in the production environment.
|
|
|
Define upstream and downstream dependencies for the task. Dependencies ensure downstream nodes start only after upstream nodes succeed, guaranteeing timely delivery of valid business data. Use automatic parsing to quickly set dependencies, or add them manually. |
|
|
Define the timeout period and retry policy for failed task runs. This prevents resource waste from long-running tasks and improves reliability. |
|
|
Select the resource group for the compute task. The system uses this group's resources when scheduling the task. |
What to do next
After configuring task properties, submit and publish the task to the production environment. Then perform O&M operations as needed. For details, see Operation Center.