This glossary defines the concepts used in Enterprise Distributed Application Service (EDAS) alert management. Use it as a reference when you configure alerts, notification policies, and escalation workflows.
Core concepts
Alert
An event that requires resolution by a contact. When a notification policy determines that an event must be resolved, the system generates an alert and sends it to alert contacts. Once the alert is resolved, the system automatically sets its status to Resolved.
Anomaly
Abnormal monitoring data caused by business rules. When monitoring services detect abnormal data, the system generates the corresponding events.
Alert management
The EDAS feature for managing the full lifecycle of alerts -- from detection and notification through escalation and resolution.
Events
Event grouping
Groups multiple events into a single alert to reduce notification noise. Instead of receiving a separate notification for each event, contacts receive one consolidated alert that summarizes the situation.
Event silencing
Suppresses notifications for events that do not require immediate attention. No notifications are sent for silenced events.
Automatic event recovery
Events are automatically marked as resolved after a specified period. The default recovery period is 5 minutes. After this period, unresolved events are counted as resolved.
Notifications
Alert notification
A message sent to alert contacts when an alert is triggered. Each notification describes the object that triggered the alert and includes a title.
Supported channels: text messages, emails, WeChat messages, alert cards in DingTalk groups, and webhooks.
Notification policy
Defines how events are grouped into alerts and how alert notifications are delivered to contacts. A notification policy controls:
Which events are grouped into a single alert
Which contacts receive notifications
Which notification channels are used (phone calls, text messages, emails, or alert cards in DingTalk groups)
Notifications are sent within 1 minute after a notification policy is triggered.
Alert card
An interactive notification format in DingTalk groups. The DingTalk group chatbot sends alert notifications as alert cards. Contacts can view, handle, and resolve alerts directly within DingTalk.
Configure the chatbot on the DingTalk Group tab of the Contact page in the EDAS console.
Escalation
Escalation policy
Defines a sequence of contacts who receive notifications when an alert remains unresolved. Each rule in the policy specifies a different batch of contacts. When an alert remains unresolved past the timeout period, the system sends notifications to the next batch of contacts in the sequence.
Add an escalation policy to a notification policy to enable multi-tier alert routing.
Escalation timeout
The waiting period before the system escalates an unresolved or unconfirmed alert to the next batch of contacts in the escalation policy. When the timeout is reached, the system sends notifications of unresolved or unconfirmed alerts to the contacts who are specified in the escalation policy.
| Parameter | Value |
|---|---|
| Default timeout | 10 minutes |
| Maximum timeout | 90 minutes |
Re-triggering escalation rules
If an alert remains unresolved after all escalation rules in a policy have been triggered, the system restarts the escalation sequence. By default, escalation rules are re-triggered once. You can configure re-triggering up to nine times.
Alert lifecycle
Alert status
When an alert is resolved, the system automatically changes its status to Resolved.
Claiming alerts
When a contact wants to take ownership of an unresolved alert, they claim the alert.
Contacts and users
Contact
An O&M engineer responsible for handling alerts. Contacts can view, handle, and resolve alerts in DingTalk.
User
An Alibaba Cloud account or RAM user. In the EDAS console, users can:
Create and modify notification policies
Create and modify escalation policies
View, handle, and resolve alerts
Monitoring comparisons
EDAS alert rules support three types of period-over-period comparisons. In each type, beta represents the metric value (average, sum, maximum, or minimum) over the most recent N minutes.
| Comparison type | Reference period (alpha) | Description |
|---|---|---|
| Minute-on-minute | The preceding N minutes (data from 2N to N minutes ago) | Compares the current metric value with the value from the immediately preceding period. Returns the percentage increase or decrease. |
| Minute-on-minute hourly | The same N-minute window one hour ago | Compares the current metric value with the value from the same time window one hour earlier. Returns the percentage increase or decrease. |
| Minute-on-minute daily | The same N-minute window at the same time yesterday | Compares the current metric value with the value from the same time window one day earlier. Returns the percentage increase or decrease. |
Alarm Data Revision
The Alarm Data Revision parameter controls how the system handles missing or undefined metric values when evaluating alert rules. Set this parameter when you create an alert.
Valid values
| Value | Behavior |
|---|---|
| Set 0 | Replaces the missing value with 0 |
| Set 1 | Replaces the missing value with 1 |
| Set Null (Won't Trigger) | Does not trigger an alert (default) |
When to use each value
No data reported
Problem: A metric has no data (for example, a page receives no visits), so the alert rule cannot be evaluated.
Example: You create a Browser Monitoring Alarm to monitor page views. The alert rule checks whether the sum of page views in the last 5 minutes is no greater than 10. If the page receives no visits, no data is reported and the alert rule is never evaluated.
Solution: Set Alarm Data Revision to Set 0. The system treats the missing data as zero, which satisfies the alert condition (0 is not greater than 10), and sends an alert.
Undefined compound indicators
Problem: A compound indicator (a formula involving division) produces an undefined result because the divisor is zero.
Example: You create a Custom Monitoring Alarm to monitor the real-time unit price of a commodity. Variable a is the current total price, and variable b is the current total number of items. The alert rule checks whether the minimum value of a/b over the last 3 minutes is less than or equal to 10. If no items are sold (b = 0), the division result is undefined and no alert is sent.
Solution: Set Alarm Data Revision to Set 0. The system treats the undefined result as 0, which satisfies the alert condition (0 <= 10), and sends an alert.
Undefined period-over-period comparisons
Problem: A period-over-period comparison cannot be calculated because the historical reference value (alpha) is missing.
Example: You create an Application Monitoring Alarm to monitor a node's CPU utilization. The alert rule checks whether the average CPU utilization decreased by 100% compared with the previous monitoring period over the last 3 minutes. If the CPU failed during the comparison window, the system cannot retrieve alpha, so the comparison result is undefined and no alert is sent.
Solution: Set Alarm Data Revision to Set 1. The system treats the undefined result as a 100% decrease, which satisfies the alert condition, and sends an alert.