Alert management terms

更新时间:
复制 MD 格式

This glossary defines the concepts used in Enterprise Distributed Application Service (EDAS) alert management. Use it as a reference when you configure alerts, notification policies, and escalation workflows.

Core concepts

Alert

An event that requires resolution by a contact. When a notification policy determines that an event must be resolved, the system generates an alert and sends it to alert contacts. Once the alert is resolved, the system automatically sets its status to Resolved.

Anomaly

Abnormal monitoring data caused by business rules. When monitoring services detect abnormal data, the system generates the corresponding events.

Alert management

The EDAS feature for managing the full lifecycle of alerts -- from detection and notification through escalation and resolution.

Events

Event grouping

Groups multiple events into a single alert to reduce notification noise. Instead of receiving a separate notification for each event, contacts receive one consolidated alert that summarizes the situation.

Event silencing

Suppresses notifications for events that do not require immediate attention. No notifications are sent for silenced events.

Automatic event recovery

Events are automatically marked as resolved after a specified period. The default recovery period is 5 minutes. After this period, unresolved events are counted as resolved.

Notifications

Alert notification

A message sent to alert contacts when an alert is triggered. Each notification describes the object that triggered the alert and includes a title.

Supported channels: text messages, emails, WeChat messages, alert cards in DingTalk groups, and webhooks.

Notification policy

Defines how events are grouped into alerts and how alert notifications are delivered to contacts. A notification policy controls:

  • Which events are grouped into a single alert

  • Which contacts receive notifications

  • Which notification channels are used (phone calls, text messages, emails, or alert cards in DingTalk groups)

Notifications are sent within 1 minute after a notification policy is triggered.

Alert card

An interactive notification format in DingTalk groups. The DingTalk group chatbot sends alert notifications as alert cards. Contacts can view, handle, and resolve alerts directly within DingTalk.

Configure the chatbot on the DingTalk Group tab of the Contact page in the EDAS console.

Escalation

Escalation policy

Defines a sequence of contacts who receive notifications when an alert remains unresolved. Each rule in the policy specifies a different batch of contacts. When an alert remains unresolved past the timeout period, the system sends notifications to the next batch of contacts in the sequence.

Add an escalation policy to a notification policy to enable multi-tier alert routing.

Escalation timeout

The waiting period before the system escalates an unresolved or unconfirmed alert to the next batch of contacts in the escalation policy. When the timeout is reached, the system sends notifications of unresolved or unconfirmed alerts to the contacts who are specified in the escalation policy.

ParameterValue
Default timeout10 minutes
Maximum timeout90 minutes

Re-triggering escalation rules

If an alert remains unresolved after all escalation rules in a policy have been triggered, the system restarts the escalation sequence. By default, escalation rules are re-triggered once. You can configure re-triggering up to nine times.

Alert lifecycle

Alert status

When an alert is resolved, the system automatically changes its status to Resolved.

Claiming alerts

When a contact wants to take ownership of an unresolved alert, they claim the alert.

Contacts and users

Contact

An O&M engineer responsible for handling alerts. Contacts can view, handle, and resolve alerts in DingTalk.

User

An Alibaba Cloud account or RAM user. In the EDAS console, users can:

  • Create and modify notification policies

  • Create and modify escalation policies

  • View, handle, and resolve alerts

Monitoring comparisons

EDAS alert rules support three types of period-over-period comparisons. In each type, beta represents the metric value (average, sum, maximum, or minimum) over the most recent N minutes.

Comparison typeReference period (alpha)Description
Minute-on-minuteThe preceding N minutes (data from 2N to N minutes ago)Compares the current metric value with the value from the immediately preceding period. Returns the percentage increase or decrease.
Minute-on-minute hourlyThe same N-minute window one hour agoCompares the current metric value with the value from the same time window one hour earlier. Returns the percentage increase or decrease.
Minute-on-minute dailyThe same N-minute window at the same time yesterdayCompares the current metric value with the value from the same time window one day earlier. Returns the percentage increase or decrease.

Alarm Data Revision

The Alarm Data Revision parameter controls how the system handles missing or undefined metric values when evaluating alert rules. Set this parameter when you create an alert.

Valid values

ValueBehavior
Set 0Replaces the missing value with 0
Set 1Replaces the missing value with 1
Set Null (Won't Trigger)Does not trigger an alert (default)

When to use each value

No data reported

Problem: A metric has no data (for example, a page receives no visits), so the alert rule cannot be evaluated.

Example: You create a Browser Monitoring Alarm to monitor page views. The alert rule checks whether the sum of page views in the last 5 minutes is no greater than 10. If the page receives no visits, no data is reported and the alert rule is never evaluated.

Solution: Set Alarm Data Revision to Set 0. The system treats the missing data as zero, which satisfies the alert condition (0 is not greater than 10), and sends an alert.

Undefined compound indicators

Problem: A compound indicator (a formula involving division) produces an undefined result because the divisor is zero.

Example: You create a Custom Monitoring Alarm to monitor the real-time unit price of a commodity. Variable a is the current total price, and variable b is the current total number of items. The alert rule checks whether the minimum value of a/b over the last 3 minutes is less than or equal to 10. If no items are sold (b = 0), the division result is undefined and no alert is sent.

Solution: Set Alarm Data Revision to Set 0. The system treats the undefined result as 0, which satisfies the alert condition (0 <= 10), and sends an alert.

Undefined period-over-period comparisons

Problem: A period-over-period comparison cannot be calculated because the historical reference value (alpha) is missing.

Example: You create an Application Monitoring Alarm to monitor a node's CPU utilization. The alert rule checks whether the average CPU utilization decreased by 100% compared with the previous monitoring period over the last 3 minutes. If the CPU failed during the comparison window, the system cannot retrieve alpha, so the comparison result is undefined and no alert is sent.

Solution: Set Alarm Data Revision to Set 1. The system treats the undefined result as a 100% decrease, which satisfies the alert condition, and sends an alert.