How to configure cluster alerts-Elasticsearch(ES)-阿里云帮助中心

Enable one-click alert

When enabled, the system automatically creates the following alert rules for all Elasticsearch instances under your account:

Abnormal cluster status
Abnormal node disk usage (>75%)
Abnormal node JVM heap usage (>85%)

Log in to the Alibaba Cloud Elasticsearch console.
In the left-side navigation pane, click Elasticsearch Clusters.
On the Elasticsearch Clusters page, click Initiative Alert.
In the Initiative Alert dialog box, click Enable Now.

Note
If the button is labeled Disable Now, the one-click alert feature is already enabled. You can skip the remaining steps.
In the Cloud Monitor console, turn on the Initiative Alert switch for the Elasticsearch service.
(Optional) Return to the Alibaba Cloud Elasticsearch console and verify that one-click alert is enabled.
1. On the Elasticsearch Clusters page, click the ID of the target instance.
2. In the left-side navigation pane, choose Monitoring and Logs > Cluster Monitoring.
3. Click the Basic Monitoring tab and check the status of Initiative Alert in the upper-right corner of the page.
  
  If the status of Initiative Alert is Enabled, the feature is active.

Configure Cloud Monitor alerts

The one-click alert feature uses a fixed template. To customize metrics, thresholds, or notification methods, create custom alert rules in Cloud Monitor.

Go to the Cloud Monitor console.
In the left-side navigation pane, choose Alerts > Alarm Rules.
Click Create Alert Rule.

On the Create Alert Rule page, configure the alert rule.

This example configures a combined alert rule for three metrics: Cluster Status, node disk usage, and node heap memory usage. Retain default values for unlisted parameters. For detailed parameter descriptions, see Create an alert rule.

Parameter	Description
Product	Select Elasticsearch.
Resource Range	Select Cluster.
Associated Resources	Add the instances that you want to monitor.
Rule Description	Click Add Rule > Combined Metrics. In the Configure Rule Description panel, configure the following parameters: Metric Type: Select Combined Metrics. Alert Level: Select Warning (Warn). Multi-metric Alert Condition: Note This example uses three metrics. Click Add Metric to add more conditions. Metric 1: Select Cluster ID > Cluster Status and set the condition to >=2. Metric 2: Select nodeName > Node Disk Usage and set the average to >=75%. Metric 3: Select nodeName > Node Heap Memory Usage_ES Business and set the average to >=85%. Relationship Between Metrics: Select Generate alerts if one of the conditions is met (\|\|). Alert Threshold Triggers: Select 3 Consecutive Cycles (1 Cycle = 1 Minute). You can also create disk usage alerts with a single-metric alert rule. For more information, see Disk Alert Configuration Example.
Alarm Contact Group	Select an existing alert contact group. If you have not created one, see Create an alert contact or an alert contact group.

The rule description settings are as follows:

Parameter	Description
Metric Type	Select Combined Metrics.
Alert Level	Select Warning (Warn).
Multi-metric Alert Condition	Click Add Metric to add more metric conditions, and configure the following three monitoring metrics: Metric 1: Select Cluster ID > Cluster Status, and set the condition to >=2. Metric 2: Select nodeName > Node Disk Usage, and set the average to >=75%. Metric 3: Select nodeName > Node Heap Memory Usage_ES Business, and set the average to >=85%.
Relationship Between Metrics	Select Generate alerts if one of the conditions is met (\|\|).
Alert Threshold Triggers	Select 3 Consecutive Cycles (1 Cycle = 1 Minute).

For an example of a single-metric rule for disk usage, see Example: Configure a disk alert.

Expand Advanced Settings and enter a publicly accessible URL in the Alert Callback field. Cloud Monitor sends alert notifications to this URL as POST requests. Only HTTP is supported. For more information, see Use alert callbacks.

The following table lists available metrics for alert rules. For more information, see Metric descriptions and troubleshooting suggestions.

Metric	Necessity	Recommended threshold	Description
Cluster Status	Required	>=2	The cluster statuses Green, Yellow, and Red correspond to the numerical values 0.00, 1.00, and 2.00, respectively. Use these values when configuring the alert.
NodeDiskUtilization(%)	Required	Average >=75%	Should not exceed 80%.
NodeHeapMemoryUtilization(%)	Required	Average >=85%	Should not exceed 90%. In the rule description, this metric is displayed as Node heap memory usage_ES Business.
NodeCPUUtilization(%)	Optional	Average >=95%	-
Node Load_1m	Optional	80% of the total number of CPU cores.	-
ClusterQueryQPS(Count/Second)	Optional	Base the threshold on your test results.	-
ClusterIndexQPS(Count/Second)	Optional	Base the threshold on your test results.	-
Full GC Count (count)	Optional	A value other than 0 indicates an abnormal status.	-
Exception count (count)	Optional	A value other than 0 indicates an abnormal status.	-
Snapshot status	Optional	A value of 2 indicates an abnormal status.	A value of -1 or 0 indicates a normal status.

Click OK.

After the alert rule is created, the specified alert contact group is notified when a metric triggers an alert. To learn how to configure notification methods, see Receive alert notifications in a DingTalk group.

Disk alert example

Disk watermark alerts are a common single-metric use case. When node disk usage exceeds a threshold, expand the disk capacity or clear data to prevent outages.

Follow the steps in Configure Cloud Monitor alerts to create an alert rule. In the Rule Description section, click Add Rule >Simple Metric. An example configuration is as follows.

Parameter	Example
Alert Rule Name	Disk Watermark Alert
Metric Type	Select Simple Metric.
Metrics	Select nodeName > Node Disk Usage.
Threshold and Alert Level	Critical: average value over 3 consecutive cycles >= 80% Warning: average value over 3 consecutive cycles >= 75% Info: average value over 3 consecutive cycles >= 70%
Chart Preview	Previews the monitoring chart for the selected metric.