Simple Log Service (SLS) Alerting is an AIOps platform for alert monitoring, noise reduction, incident management, and notification dispatching.
Architecture
Alerting consists of three subsystems: alert monitoring, alert management, and action management.

Key features
|
Category |
Subcategory |
Feature |
Description |
|
Alert monitoring |
Basic capabilities |
Query and analyze logs |
Run queries and analyses using the Query syntax and SQL-92. |
|
Query and analyze time-series data |
Run analyses using SQL-92 and PromQL. Syntax for querying and analyzing time-series data. |
||
|
Machine learning |
Use AIOps algorithms for prediction, anomaly detection, and root cause analysis. Machine learning syntax. |
||
|
Correlated monitoring |
Correlated monitoring across multiple Logstores or MetricStores |
Use SQL JOIN statements or set operations for correlated monitoring across multiple Logstores or MetricStores. |
|
|
Correlated monitoring between Logstores and MetricStores |
Use SQL JOIN statements or set operations to implement correlated monitoring between Logstores and MetricStores. |
||
|
Cross-Project correlated monitoring |
Use set operations to implement cross-Project correlated monitoring. |
||
|
Cross-region correlated monitoring |
Use set operations to implement cross-region correlated monitoring. |
||
|
Cross-account correlated monitoring |
Use set operations to implement cross-account correlated monitoring. |
||
|
Allowlist/denylist monitoring |
Use Resource Data for allowlist/denylist monitoring. |
||
|
Monitoring rule orchestration |
Configure no-data alerts |
Configure no-data alerts. |
|
|
Set alert severity |
Set static and dynamic alert severity levels. |
||
|
Set labels and annotations |
Custom labels and annotations. Variables supported in annotation values. |
||
|
Grouped evaluation |
Group query and analysis results. |
||
|
Alert recovery |
Enable resolved alert notifications. |
||
|
Set consecutive trigger threshold |
Set a consecutive trigger threshold to suppress alerts. |
||
|
Disable monitoring tasks |
Temporarily or permanently disable monitoring tasks. Paused tasks can resume automatically at a scheduled time. |
||
|
Alert management |
Noise reduction |
Deduplicate identical alerts |
Within a time window, you can deduplicate identical alerts or delay their notifications. For more information, see Deduplicate alerts based on fingerprints. |
|
Alert grouping |
Grouping policies combine alerts with the same attributes into a single notification. Multiple alert grouping methods. |
||
|
Alert silence |
Create silence policies to prevent matching alerts from triggering notifications during a specified period. |
||
|
Action management |
Action policy |
Dynamic dispatching of notification channels |
Dynamically dispatch alert notifications to users, user groups, or on-duty groups through specific channels. Action policy. |
|
Recipients |
Users |
Individual users. For more information, see Create users and user groups. |
|
|
User groups |
A group that contains multiple users. For more information, see Create users and user groups. |
||
|
On-duty groups |
Create on-duty groups that include users and user groups. Arrange rotating on-call shifts based on periods and working hours. For more information, see Create an on-duty group. |
||
|
Channel calendars |
Holiday awareness |
Automatically adjusts notification methods during holidays. |
|
|
On-call schedule |
Rotation |
Automate on-call rotations for users and user groups on a specified cycle. |
|
|
Override |
Configure temporary shift overrides for a specific period. |
||
|
Holiday awareness |
Automatically adjust rotation or override schedules based on holidays. |
||
|
Independent calendars |
Configure separate, resettable calendars for on-duty groups. |
||
|
Notification channels |
SMS notifications |
Sends alert content through SMS messages. |
|
|
Voice notifications |
Sends alert content through voice calls. |
||
|
Email notifications |
Sends alert content through email. |
||
|
DingTalk notifications |
Sends alert content through a DingTalk chatbot. |
||
|
Webhook notifications |
Sends alert notifications to a custom webhook address by using HTTP or HTTPS calls. Use webhooks to extend notification channels to platforms such as WeCom, Lark, and Slack. |
||
|
Message Center |
Sends alert content through the Alibaba Cloud Message Center. |
||
|
Alert analysis |
Global Alert Center |
Execution history report for alert monitoring rules |
Provides execution history reports for alert monitoring rules to help with troubleshooting. |
|
Alert Rule Center |
Provides a dashboard to view the overall execution status and triggered alert status of alert rules. |
||
|
Alert Trace Center |
Provides a dashboard that shows the entire alert lifecycle, from generation and management to final notification. |
||
|
Alert Troubleshooting Center |
Provides a troubleshooting center that displays errors from various stages, including alert monitoring, management, and notification, to simplify debugging. |
||
|
Centralized storage |
Centralized alert storage allows you to easily view received, processed, and sent alerts and their related logs. After you initialize the alerting feature, SLS automatically creates a Project named sls-alert-<ACCOUNT_ID>-<REGION> and a Logstore named internal-alert-center-log in the selected region to store alerts. |
Benefits
-
Easy to start and scale
SLS provides end-to-end log and time-series data processing: ingest, store, query, analyze, visualize, and alert. After importing data, create monitoring tasks, notification channels, and alert policies within minutes.
Scale your alerting configuration from small teams to enterprise-wide scenarios.
-
High availability and reliability
Built on SLS infrastructure, Alerting provides 99.9% service availability and over 99.99999999% data reliability for alert-related data.
-
Low cost and maintenance-free
Alert monitoring and incident management are currently free. Only SMS and voice call notifications incur a small fee.
As a fully managed SaaS service, Alerting eliminates the operational overhead of running your own alerting system.
-
Fast response to issues
Intelligent monitoring and incident management accelerate alert response and issue resolution, reducing business disruption losses.
Use cases
DevOps
Monitor all stages of the development lifecycle. Track Kubernetes logs, application logs, and metrics across development, staging, and production environments. When errors or anomalies such as latency spikes are detected, the responsible developers are notified immediately.
Built-in rule templates in SLS applications such as Log Audit Service and SLB Log Center simplify monitoring setup.
ITOps
Monitor stability metrics such as response time, load, and error rates in real time. Alerting supports noise reduction, grouping, and dynamic dispatching based on custom dimensions. Alerts are automatically assigned to the on-call engineer based on schedules and calendars, with automated workflows for resolution notifications, status updates, and escalation.
AIOps
Combine SLS machine learning with Alerting to monitor log and time-series data. SLS provides over a dozen ML algorithms — smoothing, prediction, decomposition, clustering, and pattern mining — applicable directly in alert monitoring rules. For more information, see Machine learning functions. The ML service uses streaming statistics or graph algorithms to detect anomalies and route them to the alerting system.
SecOps
Continuously monitor audit and security data to identify compliance anomalies and threat events. Alerting automatically dispatches notifications based on event severity and source, and supports workflow automation such as security posture dashboards.
SLS Log Audit Service automates cross-account collection of compliance and security logs from major Alibaba Cloud products, with built-in threat intelligence integration and nearly 100 monitoring rule templates.
BizOps
Track business metrics such as user activity, ad click-through rates, and cloud product bills to detect anomalies like unusual charges. Log on to the Billing Management console to view details.
Key concepts
|
Term |
Description |
|
Logstore |
Logstores store log data with query and analysis capabilities (SQL-92). Alert monitoring depends on this feature. |
|
MetricStore |
MetricStores store time-series data with query and analysis capabilities (SQL-92 and PromQL). Alert monitoring depends on this feature. |
|
alert |
When used alone, it refers to an alert event. For example, after an alert monitoring rule triggers one or more alerts, they are passed to the action management system through the alert management system. When combined with other words, "alert" refers to a subsystem, feature, entity, or module of the alerting feature, such as the alert monitoring system or an alert monitoring rule. |
|
Alert monitoring |
A subsystem responsible for generating alerts. The alert monitoring system consists of alert monitoring rules and Resource Data. It periodically evaluates data based on alert monitoring rules, assesses query and analysis results according to rule orchestration logic, and triggers alerts or resolved alerts, which are then sent to the alert management system. |
|
Alert management |
A subsystem responsible for noise reduction and managing alert statuses. The alert management system consists of alert policies, incident management, and alert overview dashboards. The alert management system processes received alerts by routing, deduplicating, silencing, and grouping them based on alert policies before sending them to the action management system. It also supports setting incident stages and assignees. |
|
Action management |
A subsystem responsible for managing alert notification channels and recipients. The action management system consists of action policies, content templates, calendars, users, user groups, on-duty groups, and channel quotas. The action management system dynamically dispatches alerts to specific notification channels based on action policies, which then notify the target users, user groups, or on-duty groups. It also supports customizing alert notification content. |
Alert monitoring
The alert monitoring system generates alerts and consists of alert monitoring rules and Resource Data.

|
Term |
Description |
|
Alert monitoring rule |
An alert monitoring rule contains query and analysis statements, target objects (Logstores, MetricStores, and Resource Data), and monitoring orchestration settings. Create an alert monitoring rule. |
|
Resource Data |
An independent, editable, table-like storage structure for resource configurations and custom data used by the alerting system. Primarily used for correlated queries, such as allowlist/denylist scenarios. |
|
Alert severity |
A non-identifying attribute indicating alert seriousness. Levels: Critical, High, Medium, Low, and Report. Set alert severity. |
|
Grouped evaluation |
A parameter in an alert monitoring rule. The system groups query and analysis results by specified fields, evaluating each group independently against the trigger condition. This lets a single rule monitor multiple targets, with each group managed as a separate alert and incident. Configure grouped evaluation. |
|
Evaluation expression |
A computational expression that uses a specific evaluation syntax to configure alert trigger conditions or dynamically assess alert severity. Evaluation expressions support logical comparisons and calculations using fields from query and analysis results. A true result indicates a match. Configure an evaluation expression. |
|
Alert label |
An identifying attribute of an alert in key-value format. Define custom labels in an alert monitoring rule. When an alert is triggered, the corresponding label is attached. Labels can be referenced in content templates and used as alert attributes for management and notification dispatching in alert management and action management.
For more information, see Labels. |
|
Alert annotation |
A non-identifying attribute of an alert in key-value format. Define custom annotations in an alert monitoring rule. When an alert is triggered, the corresponding annotation is attached. Annotations can be referenced in content templates and used as alert attributes for management and notification dispatching. For more information, see Annotations. |
|
Resolved alert |
A special type of alert notification indicating that the alert condition is resolved. A normal alert has a "triggered" status, while a resolved alert has a "resolved" status. When you enable this feature, if the previous check by the alert monitoring system triggered an alert, but the current check's results do not meet the trigger condition, a resolved alert is sent. In high-frequency monitoring scenarios, enabling resolved alerts ensures you are promptly notified when an issue is resolved. For more information, see Configure resolved alerts. |
Alert management
The alert management system handles noise reduction and status management. It consists of alert policies, incident management, and alert overview dashboards.

|
Term |
Description |
|
Alert policy |
A configuration entity in the alert management system and a parameter in an alert monitoring rule. When the alert management system receives an alert (including a resolved alert), it automatically performs noise reduction and grouping based on the alert policy. The resulting grouped alerts are then sent to the action management system for notification. |
|
Alert fingerprint |
The alert management system calculates a fingerprint for each alert it processes. Alerts with the same fingerprint are considered identical. The fingerprint is calculated based on the alert's identifying attributes, including the Alibaba Cloud account ID, the Project where the alert resides, the alert rule ID, and the alert labels. For more information, see Deduplicate alerts based on fingerprints. |
|
Alert silence |
A configuration item in an alert policy. Based on the silence policy, the system ignores matching alerts during the specified period, suppressing notifications. Alert silence mechanism. |
|
Alert grouping |
A configuration item in an alert policy. After receiving alerts, the system groups matching alerts into a set according to the grouping policy. After delay and deduplication, the set is sent to the action management system. Multiple alert grouping methods. |
|
Grouped set |
A collection that stores grouped alerts, containing one or more alerts with different fingerprints. After processes like delay and deduplication, the grouped set is sent to the action management system for notification. |
Action management
The action management system manages notification channels and recipients. It consists of action policies, content templates, calendars, users, user groups, on-duty groups, and channel quotas.

|
Term |
Description |
|
Action policy |
An action policy is a configuration entity in the action management system. When the action management system receives a grouped set of alerts (including resolved alerts) from the alert management system, it uses the action policy to dynamically dispatch notifications to specific channels. These channels then notify the target users, user groups, or on-duty groups. |
|
Webhook integration |
Manage webhook notification channels. Use webhooks directly in action policies. SLS supports DingTalk, WeCom, Lark, Slack, and custom generic webhooks. For more information, see Create a webhook. |
|
Content template |
SLS sends alert content based on the defined content template. Each primary channel has a corresponding text template that supports referencing alert attributes with variables. For webhook channels, you can also configure the message entity format to adapt to specific protocols, such as the format required by WeCom. For more information, see Create a content template. |
|
Calendar |
An independent configuration in the action management system, including a global default calendar and custom calendars.
|
|
User |
A configuration entity representing a specific recipient. It includes information such as user ID, username, phone number, and email. Use action policies to send alerts to target users. Set target users as assignees in incident management. |
|
User group |
A configuration entity representing a virtual collection of users. It includes a user group identifier, group name, and a list of users. A user group can contain one or more users. Use action policies to send alerts to target user groups. |
|
On-duty group |
A configuration entity representing a collection of on-duty users and user groups. It includes an on-duty group identifier, group name, rotation and override configurations, and an associated calendar. An on-duty group can contain one or more users or user groups. Use action policies to send alerts to target on-duty groups. For information about how to create an on-duty group, see Create an on-duty group. |
|
Rotation |
A configuration item in an on-duty group used to set up a rotation schedule for users or user groups. Add multiple rotation schedules to an on-duty group. Rotations support non-continuous time periods and dynamic handovers based on the calendar. |
|
Override |
A configuration item in an on-duty group used to set up an override schedule for users or user groups. Add multiple override schedules to an on-duty group. |
Limitations
|
Category |
Item |
Description |
|
Alert monitoring |
Maximum number of alert monitoring rules |
Create up to 100 alert monitoring rules per Project. |
|
General limits on query and analysis |
||
|
Concurrency limits on query and analysis |
If a large number of query and analysis operations are performed simultaneously in a Project (for example, through an SDK) and many alert monitoring rules are created, the number of concurrent queries may exceed the Project's limit, causing monitoring to fail. We recommend setting Dedicated SQL to Auto when creating alert monitoring rules to support higher concurrency. If you use Dedicated SQL, ensure the target Project has sufficient Dedicated SQL CUs. To create an alert monitoring rule, see Create an alert monitoring rule. To enable Dedicated SQL, see High-performance and fully accurate queries and analysis (Dedicated SQL). |
|
|
Query and analysis syntax limits |
Only query statements, SQL analysis statements, and SQL+PromQL statements are supported. Phrase query statements and Scan mode query and analysis statements are not supported. Note
Alert monitoring rules that use phrase queries or Scan mode query and analysis statements (SPL syntax) can be created, but they might fail or produce unexpected results during runtime. |
|
|
Limits on a single query and analysis result |
|
|
|
Number of combined queries |
1 to 3. |
|
|
Field value length |
If a field value exceeds 1,024 characters, only the first 1,024 characters are used for analysis. |
|
|
Query and analysis time range |
The time span for each query and analysis statement cannot exceed 24 hours. |
|
|
Resource Data update latency |
Updates to Resource Data are not immediate. The changes take effect within 15 minutes. |
|
|
Alert management |
Alert policy evaluation interval |
The minimum evaluation interval is 15 seconds. Even if you set a smaller value, checks are performed at 15-second intervals. |
|
Policy matching conditions |
In configurations like alert policies and action policies, we recommend using conditions based on Project name, alert rule ID, alert name, severity, or short labels or annotations.
|
|
|
Number of incidents |
Up to 1,000 incidents are retained for 30 days. Older incident data is automatically overwritten. |
|
|
Incident comments |
Each incident can have up to 10 comments. Older comments are automatically overwritten. |
|
|
Policy configuration update latency |
Changes to alert-related policies, such as alert policies, action policies, content templates, users, user groups, and on-duty groups, typically take effect within one minute. |
|
|
Action management |
Notification channels |
The following limits apply to notification channels. Exceeding these limits may prevent you from receiving alert notifications. If you do not receive a notification, you can check for related errors in the Global Alert Troubleshooting Center. For more information, see Global Alert Troubleshooting Center.
|
|
Notification content |
Each notification channel has a content length limit. To ensure successful delivery, the system may truncate oversized content. Truncation does not guarantee content integrity or 100% delivery success, especially if the truncated content results in invalid Markdown or HTML. For plain text formats like SMS and voice calls, truncation generally does not cause delivery failure. Configure content templates according to channel limits to avoid delivery failures caused by oversized content. The limits for each channel are as follows (Chinese characters, English letters, numbers, and punctuation all count as one character): Note
If a field value exceeds 1,024 characters, only the first 1,024 characters are used.
|
|
|
Content template configuration |
An incorrect content template configuration may cause rendering to fail and return an error. If you receive an alert notification containing an error message like |
|
|
Content template variables |
The content length is limited to 2 KB. Content exceeding this limit will be truncated. |
Billing
Fees are incurred for SMS and voice call notifications. For detailed pricing, see Pricing.
|
Operation |
Description |
|
SMS notifications |
Fees are charged per alert SMS notification sent. Note
Some carriers may split long SMS messages (for example, over 70 characters) into two messages. If your message is long, you may receive two separate messages, but SLS will only charge for one. |
|
Voice notifications |
Fees are charged per alert voice call. Note
|