Create alert rules to monitor your applications and services. When conditions are met, CloudMonitor notifies you through contacts, chatbots, webhooks, or action integrations.
Prerequisites
-
You have enabled the required observability monitoring services, such as Prometheus, Application Monitoring, and Log Service.
-
You have created a notification contact.
Create an alert rule
-
Log in to the CloudMonitor 2.0 console. In the left-side navigation pane, choose All Features > Alert Center.
-
On the Alert Center page, choose Alert management > Alert rules.
-
On the Alert rules page, click Create alert rule.
-
In the Create alert rule panel, configure the following parameters.
-
Rule name: A name to identify the alert rule.
-
Monitoring type: The type of service or resource to monitor.
-
Managed Service for Prometheus/Cloud Synthetic Monitoring
Parameter
Description
Data Source Type
The source of the monitoring data.
Region
The region where the data source resides.
Prometheus instance
The instance to which the alert rule applies.
Detection condition definition method
Custom PromQL: Create a custom PromQL query. PromQL Function Usage Examples.
Configure based on predefined metrics:
-
Metric group: Select a metric group.
-
Metric: Select a metric.
-
Detection condition: Set the detection condition by specifying a comparison operator and a value. p50, p75, p90, and p99 represent percentiles.
-
PromQL preview: Preview the PromQL query for the predefined metric.
Severity level
Set the severity level for the alert rule.
-
P1: Critical: Issues affecting core service availability with widespread impact.
-
P2: Error: Partial service failures affecting availability.
-
P3: Warning: Potential issues that could cause service errors or affect your business.
-
P4: Information: Low-priority events. Default level.
Duration
How long a condition must persist before an alert triggers. Prevents false alarms from transient fluctuations.
Alert detection period
The execution interval of the alert rule. The default value is 60 seconds, which means the check is performed once per minute.
Content
You can use Go template syntax to customize the content of alert messages. For example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, Current value {{ printf "%.2f" $value }}%
Labels
Custom key-value pairs for categorizing and filtering alert rules. For example:
env: productionandteam: sre.Annotations
Additional information for the alert rule, such as long-form descriptions or runbook links. For example:
description: CPU usage is highandrunbook_url: https://wiki.xxx.com/runbook\. -
-
Application Monitoring
Parameter
Description
Data Source Type
The type of the data source.
Region
The region where the data source resides.
Application
The application to monitor.
Metric group
The metric group for the application.
Interface name
Interface matching method: Traverse, Equals, Not Equals, Regex Match, Regex Not Match, or No Dimension.
Interface call type
Detection condition method
Single condition:
-
Set the time range to the Last
NMinutes, the call type, the calculation method, and a comparison operator. -
Set the thresholds for different severity levels: Critical, Error, Warning, and Information.
Multiple conditions:
-
Multi-alert trigger rule: Select Any condition is met or All conditions are met.
-
Detection condition 1: Same parameters as a single condition.
-
Add detection condition: Add more conditions as needed.
-
Severity level: Valid values: P1: Critical, P2: Error, P3: Warning, and P4: Information.
Alert detection period
The execution interval of the alert rule. The default value is 60 seconds, which means the check is performed once per minute.
Content
The customizable content of alert notifications.
Tags
Custom key-value pairs for categorizing and filtering alert rules. For example:
env: productionandteam: sre.Annotations
Additional information for the alert rule, such as long-form descriptions or runbook links. For example:
description: CPU usage is highandrunbook_url: https://wiki.xxx.com/runbook\. -
-
Large model observability
Parameter
Description
Data Source Type
The data source type, which is automatically set to UModel.
Entity Type
The type of entity to monitor.
Metric set
The set of metrics to evaluate, such as AI application operational metrics, GenAI model metrics, or AI application traffic metrics.
Detection condition
Set the threshold that triggers the alert.
Severity level
The severity level of the alert. Valid values are P1: Critical, P2: Error, P3: Warning, and P4: Information.
Duration
The duration the condition must persist before an alert triggers.
Alert detection period
The execution interval of the alert rule. The default value is 60 seconds, which means the check is performed once per minute.
Content
The customizable content of alert notifications.
Tags
Custom key-value pairs for categorizing and filtering alert rules. For example:
env: productionandteam: sre.Annotations
Additional information for the alert rule, such as long-form descriptions or runbook links. For example:
description: CPU usage is highandrunbook_url: https://wiki.xxx.com/runbook\. -
Container Insights/ECS Insights/Hologres Insights/AI Training Service Insights/Database Insights
Parameter
Description
Data Source Type
The source of the monitoring data.
Region
The region where the data source resides.
Prometheus instance
The instance to which the alert rule applies.
Detection condition definition method
Custom PromQL: Create a custom PromQL query. PromQL Function Usage Examples.
Configure based on predefined metrics:
-
Metric group: Select a metric group.
-
Metric: Select a metric.
-
Detection condition: Set the detection condition by specifying a comparison operator and a value.
-
PromQL preview: Preview the PromQL query for the predefined metric.
Severity level
Set the severity level for the alert rule.
-
P1: Critical
-
P2: Error
-
P3: Warning
-
P4: Information
Duration
The duration the condition must persist before an alert triggers.
Alert detection period
The execution interval of the alert rule. The default value is 60 seconds, which means the check is performed once per minute.
Run detection after data is complete
Select a detection method.
Content
You can use Go template syntax to customize the alert message content. For example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, current value {{ printf "%.2f" $value }}%
Tags
Custom key-value pairs for categorizing and filtering alert rules. For example:
env: productionandteam: sre.Annotations
Additional information for the alert rule, such as long-form descriptions or runbook links. For example:
description: CPU usage is highandrunbook_url: https://wiki.xxx.com/runbook\. -
-
Log Audit
Parameter
Description
Select template
ActionTrail: Select an ActionTrail template.
Host audit: Select a host audit template.
Container audit: Select a container audit template.
Query and statistics
Single query: Query by configuring log-related parameters.
Set operation: Configure set operations across multiple resource sets.
Detection Logic
Add conditions and set the data matching method and severity level.
Severity level
Valid values: Critical, Error, Warning, and Information.
Consecutive hits
Specify the number of consecutive times the condition must be met to trigger an alert.
Alert detection period
The execution interval of the alert rule. The default value is 60 seconds, which means the check is performed once per minute.
Tags
Custom key-value pairs for categorizing and filtering alert rules. For example:
env: productionandteam: sre.Annotations
Additional information for the alert rule, such as long-form descriptions or runbook links. For example:
description: CPU usage is highandrunbook_url: https://wiki.xxx.com/runbook\. -
Log Service: Same parameters as the Log Audit monitoring type.
-
-
Alert notification.
-
Notification recipient: The recipients to notify when an alert is triggered.
-
Contact: Individual contacts to notify.
-
Contact group: A group of contacts to notify.
-
DingTalk: Sends alerts to a DingTalk group chatbot.
-
WeCom: Sends alerts to a WeCom chatbot.
-
Lark: Sends alerts to a Lark chatbot.
-
Slack: Sends alerts by using Slack.
-
Custom webhook: Sends alerts by using a custom webhook.
-
-
Integrate with ARMS alert management: Integrates with Application Real-Time Monitoring Service (ARMS) to manage alert lifecycles.
NoteAlert events are sent to the ARMS Alert O&M center by default. Configure notifications there.
-
Action integration: The service to trigger for automated incident response, such as Log Service, lightweight message queues, Function Compute, and third-party services like PagerDuty and webhooks.
-
Notification silence period: The time to wait before resending a notification for an unresolved alert. Valid values: 1, 5, 10, 15, 30, and 50 minutes, and 1, 3, 6, 12, and 24 hours.
NoteExample: With Notification silence period set to 12 hours, CloudMonitor resends the notification after 12 hours if the alert persists.
-
Effective period: The time window when the alert rule is active. Notifications are sent only during this period.
Note-
Alerts triggered outside the effective period are recorded in history but do not generate notifications.
-
The notification period can be set within a 24-hour range and can span across days, for example, from 23:00 to 01:00 on the next day.
-
-
-
Manage alert rules
-
The Alert Rules page lists all alert rules with the following information.
Parameter
Description
Alert status
The current status of the rule. Valid values:
- Ok: No alert condition triggered.
- Alarm: Alert condition triggered, alert active.
- NoData: No monitoring data available.
Rule name/ID
The display name and unique identifier (UUID) of the alert rule.
Enabled status
Whether the rule is enabled. Enabled rules are evaluated at the configured interval; disabled rules are not.
Service source
The service that the rule applies to.
-
You can search for alert rules using the following parameters:
-
Monitoring type
-
Rule name/ID
-
Alert status
-
Enabled status
-
More filters: Search by tag or notification contact.
-
-
Edit: To edit an alert rule, select it and click edit in the actions column, modify the rule, and click OK.
-
Enable/Disable: Toggle the switch in the enabled status column.
-
Delete: Click delete
in the actions column.WarningThis action cannot be undone. Proceed with caution.