Baseline management

更新时间:
复制 MD 格式

To ensure that your critical tasks finish on time, you can add them to a baseline and set a committed finish time. The system calculates an estimated finish time for baseline tasks based on their run history. If the system predicts that a task may not complete before the committed finish time, it sends an alert. This topic describes how to create and manage baselines.

Background

Smart Baseline detects exceptions that prevent tasks from completing on time and sends alerts in advance. This ensures that important data is generated within the expected timeframe, especially in scenarios with complex dependencies. For more information, see Smart Baseline overview.

After a baseline is created and enabled, it takes effect the next day. You can then go to the Auto Triggered Instances page to view the baseline's execution status.

Limitations

  • Version requirements:

    The baseline management feature is available only in DataWorks Standard Edition and later. If you are using an earlier version, upgrade your DataWorks instance before you use this feature. For more information, see Features of DataWorks editions.

  • Permission control:

    • Only an Alibaba Cloud account or a RAM user with the workspace administrator or tenant administrator role can create baselines.

    • Only the tenant administrator and the baseline owner can enable, disable, delete, or modify a baseline.

    To grant a user these permissions, assign the required roles to the user. For more information, see Add workspace members and assign roles.

  • Alert notification methods:

    DataWorks supports multiple alert notification methods, including Email, SMS, phone calls, DingTalk Chatbot, and Webhook. The following table describes the limitations for each method.

    Notification method

    Available regions

    Available editions

    Description

    SMS

    All regions

    Standard Edition and later.

    To receive text message alerts in other regions, click the application link to join the "Alibaba Cloud Big Data AI Platform" chat group. Then, scan the QR code below to join the DataWorks product DingTalk group for support. There, you can ask the chatbot for help or contact an on-duty engineer during business hours.技术支持二维码

    Webhook

    All regions

    Basic Edition

    Supports sending alerts to DingTalk, Lark, and WeCom groups by using group webhooks.

    Enterprise Edition

    In addition to the features available in Basic Edition, Enterprise Edition supports configuring custom webhooks to receive alerts.

    Note

    If you need to use a custom webhook, refer to Custom webhooks for smart monitoring for configuration details. After you complete the configuration, contact us for further assistance.

    Note

    To enable a RAM user to receive alerts by text message or phone call, add them as an alert contact on the Alert Contacts page. When a task fails, DataWorks sends the alert to the associated contacts. For more information, see View and manage alert contacts.

Create a baseline

  1. Go to the Operation Center page.

  2. Log on to the DataWorks console. After you select a region, click Data Development and O&M > Operation Center in the left-side navigation pane. In the drop-down list, select the desired workspace and click Operation Center.

  3. In the left-side navigation pane, click Task Monitoring > Intelligent Baselines.

  4. Create the baseline.

    1. On the Baselines tab, click Create Baseline.

    2. Configure the basic properties of the baseline.

      The following table describes the parameters.

      Parameter

      Description

      Baseline Name

      A custom name for the baseline.

      Belongs work space

      Select the workspace to which the nodes you want to monitor belong.

      Note

      When you configure the Nodes parameter, you can select only nodes and workflows within this Belongs work space.

      Owner

      The owner of the baseline.

      Baseline Type

      Defines the monitoring cycle for the baseline, which can be daily or hourly.

      • Day-level Baseline: Monitors tasks on a daily basis. This is suitable for daily scheduled tasks.

      • Hour-level Baseline: Monitors tasks on an hourly basis. This is suitable for hourly scheduled tasks.

      Nodes

      Select the tasks you want to monitor to ensure they complete on time.

      Adding a node to a baseline moves it from its current baseline to the new one.
      • Node: Enter the name or ID of a node and click the Add button. You can add multiple nodes to the baseline.

      • Workflow: Enter the name or ID of a workflow and click the Add button. By default, all nodes in the workflow are added to the baseline.

        Note

        After you select a workflow, we recommend adding only its most downstream nodes. Once added, the baseline automatically includes all upstream nodes that affect their data output in its monitoring scope. Adding all nodes in a workflow to a baseline is not recommended.

      Priority

      Sets the baseline priority, where a larger value indicates higher priority. When scheduling resources are limited, tasks on higher-priority baselines are scheduled first. This priority setting applies to auto triggered instances generated the following day.

      Note
      • MaxCompute nodes:

        The baseline priority is mapped to the priority of MaxCompute compute jobs under the following conditions:

        • The priority feature is enabled for the MaxCompute project.

        • The MaxCompute project uses subscription compute resources.

        MaxCompute job priority = 9 - DataWorks baseline priority.

      • EMR nodes:

        You can map baseline priorities to YARN queue priorities to adjust the final YARN queue priority of a node. This determines whether the node can be preferentially scheduled and executed. For more information, see Configure mappings between baseline priorities and YARN queue priorities.

      Once a node is added to a baseline, its priority is set by that baseline. This priority then propagates to all of the node's upstream dependencies.

      • If a node affects the data output of multiple baselines with different priorities, the node's priority is determined by the highest priority among those baselines.

      • Baseline priority does not affect upstream nodes with cross-cycle dependencies.

      Estimated Completion Time

      DataWorks calculates the baseline's estimated finish time based on the average completion time of its tasks over a historical period (typically the last 10 days). If the estimated finish time is later than the baseline alert time, DataWorks triggers a baseline alert. For information about the alerting mechanism, see Appendix: Baseline alerting mechanism.

      Note

      If there is insufficient historical data, the system displays a message: The completion time cannot be estimated due to the lack of historical data.

      Committed Completion Time

      The latest time by which tasks on the baseline must be completed. This is also the deadline for data production. The baseline uses this time to calculate the alert time. You must configure the committed finish time based on the estimated finish time. The alert time (calculated as committed finish time - alert margin threshold) must be later than the tasks' estimated finish time.

      Note
      • The formula for the alert time is alert time = committed finish time - alert margin threshold. An alert is triggered if the system predicts that a task will not be completed by the alert time. For example, if the committed finish time is set to 3:30 and the alert margin threshold is 10 minutes, the alert time is 3:20. A baseline alert is sent if the system predicts the task will not finish by this time.

      • For an hour-level baseline, you must specify which hourly instance (i.e., which specific run cycle) requires monitoring and set its latest completion time.

      • Because tasks on a baseline can run longer than 24 hours, you can set the committed finish time within a two-day window, from 00:00 to 47:59. For example, if a task runs for more than a day, you can set the time to 36:00.

      Alert Margin Threshold

      This threshold determines how early an alert is triggered before the committed finish time. The resulting alert time should be later than the estimated finish time to prevent frequent, unnecessary alerts. We recommend that you configure the alert margin threshold based on the runtime of the tasks on the baseline. For more information, see Configure an appropriate committed finish time and alert margin threshold.

  5. Configure alerting for the baseline.

    These policies include baseline alerts, which are triggered when data is predicted to be late, and event alerts, which are triggered by errors or slowdowns affecting the baseline's tasks or their dependencies. Before you configure these settings, we recommend that you understand the alerting mechanism. For more information, see Appendix: Baseline alerting mechanism.

    1. Enable alerting.

      After you enable alerting, DataWorks checks for conditions that meet the alert rules and sends notifications accordingly.

      • If the system predicts that tasks on the baseline will not complete within the committed time, it sends a baseline alert based on the configured notification method. For more information, see Key logic: baseline alerts.

      • If a baseline task or its upstream dependencies encounter an error, or if a task on the critical path slows down, the system sends an event alert based on the configured notification method. You can view the list of existing events on the Events page in DataWorks. For more information, see Manage events.

    2. Note

      If you disable alerting, the baseline does not generate any alerts. However, if the baseline is enabled, baseline instances are still generated and the priority setting remains in effect.

    3. Select notification methods.

      After you enable alerting, you can select notification methods as needed. We recommend that you configure both baseline alerts and event alerts for critical tasks.

      Important

      Baseline alert

      Parameter

      Description

      Enable Alerting

      Enables or disables alerting for this baseline.

      Note

      If you disable alerting, the baseline does not generate any alerts. However, if the baseline is enabled, baseline instances are still generated and the priority setting remains in effect.

      Alert Notification Method

      • Supports sending alerts by Email, SMS, or Phone call to the baseline owner, the on-duty engineer in the shift schedule, or specified recipients. To configure a shift schedule, see Shift schedule.

      • Supports sending alerts to other applications such as DingTalk, WeCom, and Lark by using a DingTalk Chatbot or a Webhook. To configure a DingTalk chatbot, see Scenario: Send alert notifications to a DingTalk group.

      Note
      • You can use Check Contact Information or Send Test Message to verify that alerts can be sent correctly.

      • Phone call alerts are available only in DataWorks Professional Edition and later.

      • If you select phone call alerts, DataWorks rate-limits calls to avoid a burst of calls in a short period. A user receives at most one alert call every 20 minutes. Additional alerts are downgraded to text messages.

      Maximum Alerts

      The maximum number of alerts that can be sent. After this limit is reached, no more alerts are generated.

      Minimum Alert Interval

      The minimum time interval between two consecutive alerts.

      Alerting Do-Not-Disturb Period

      If you set a do-not-disturb period, the system does not send alerts during this time.

      For example, if the do-not-disturb period for a task is set from 00:00 to 08:00, baseline and event alerts are not triggered during this period. If the event is still in an abnormal state at 08:00, an alert is sent.

      Event alert

      Parameter

      Description

      Event Type

      Defines the event types that trigger an alert. They include:

      • Error: A task within the baseline monitoring scope fails to run.

      • Slow: The current runtime of a task within the baseline monitoring scope is significantly longer than its average runtime over a past period.

      Alert Notification Method

      • Supports sending alerts by Email, SMS, or Phone call to the task owner, the on-duty engineer in the shift schedule, or specified recipients. To configure a shift schedule, see Shift schedule.

      • Supports sending alerts to other applications such as DingTalk, WeCom, and Lark by using a DingTalk Chatbot or a Webhook. To configure a DingTalk chatbot, see Scenario: Send alert notifications to a DingTalk group.

      Note
      • You can use Check Contact Information or Send Test Message to verify that alerts can be sent correctly.

      • Phone call alerts are available only in DataWorks Professional Edition and later.

      • If you select phone call alerts, DataWorks rate-limits calls to avoid a burst of calls in a short period. A user receives at most one alert call every 20 minutes. Additional alerts are downgraded to text messages.

      Maximum Alerts

      The maximum number of alerts that can be sent. After this limit is reached, no more alerts are generated.

      Minimum Alert Interval

      The minimum time interval between two consecutive alerts.

      Alerting Do-Not-Disturb Period

      If you set a do-not-disturb period, the system does not send alerts during this time.

      For example, if the do-not-disturb period for a task is set from 00:00 to 08:00, baseline and event alerts are not triggered during this period. If the event is still in an abnormal state at 08:00, an alert is sent.

    4. Click OK to create the baseline.

Add nodes to a baseline

A node can belong to only one baseline at a time. Adding a node that is already in Baseline A to Baseline B will move it to Baseline B.

Note

If an enabled baseline has no nodes, it becomes an empty baseline and generates empty baseline instances. For more information about empty baselines, see Why is the status of my baseline displayed as Empty Baseline on the Baseline Instances page?

You can add nodes to a baseline in one of the following two ways:

  • Go to the Baselines page and click Create Baseline to add nodes.

  • Go to the Auto Triggered Node page and choose More > Add Baseline for a specific task.

    Note

    This method only allows you to create a new baseline for the selected tasks. You cannot use it to add tasks to an existing baseline.

    • Add a single node to a baseline

      In the Actions column of the target auto triggered task, click More > Add Baseline.

    • Add multiple nodes to a baseline

      Select multiple auto triggered tasks and, in the menu bar at the bottom, click Actions > Add Baseline.

Manage baselines

On the Baselines page, you can filter baselines by criteria such as Owner, Workspace, Baseline Name, and Priority, and perform the following operations:

  • View View Details: View the basic information of the baseline tasks.

  • Modify Baseline: Modify the baseline information as needed.

  • View View Change Records: View the historical changes of the baseline.

  • Enable or Disable Baseline: Controls whether the baseline is active. An enabled baseline generates a new baseline instance daily. You can view the daily baseline details on the Baseline Instances panel.

  • Delete Baseline: Delete the baseline as needed.

Appendix: baseline alerting mechanism

Baseline alerting is a notification service for baselines that are enabled and have alerting turned on. You can configure the Alert Margin Threshold and Committed Completion Time for a baseline based on its Estimated Completion Time. DataWorks calculates a baseline's estimated finish time based on the historical average runtime (typically the last 10 days) of the tasks it monitors. The system then monitors the tasks based on their actual running status. If the system predicts that a task on the baseline cannot be completed by the alert time (committed finish time - alert margin threshold), it sends a baseline alert to the configured recipients.

Note

Improperly configured alert margin thresholds and committed finish times can lead to unexpected alerts. For more information, see Configure an appropriate committed finish time and alert margin threshold.

  • Baseline alert policy before a task runs:

    Note

    Before a daily task runs, the baseline system calculates the average completion time of tasks within its monitoring scope over the past 10 days. If it predicts that a task will not complete by the alert time, it immediately sends a baseline alert to the configured recipients. In scenarios where task dependencies are complex and frequently change, baselines can help you detect issues and receive early warnings.

    • If the estimated finish time of a baseline task, calculated from its average completion time over the past 10 days, is later than the baseline alert time, the platform triggers a baseline warning. You can view the calculated estimated finish time on the Baselines page. For more information, see Create a baseline.

    • If the estimated finish time of an upstream task, calculated from its average completion time over the past 10 days, is later than the baseline alert time, the platform triggers a baseline warning.

  • Baseline alert policy while a task is running:

    A baseline warning is triggered if the actual completion time of a task on the baseline is later than the baseline alert time.

Next steps

After you create a baseline, you can perform the following operations:

  • View baseline instances: An enabled baseline generates an instance every day. You can view baseline run details on the Baseline Instances page.

  • EMR node: Configure mappings between baseline priorities and YARN queue priorities to adjust the final YARN queue priority of an EMR node. This determines whether the node is prioritized for scheduling and execution.

  • View baseline operation records: View a history of all operations performed on your baselines in Operation Center.