During the operation of an Elasticsearch cluster, issues such as an abnormal cluster status or high node disk usage can impact service availability. By configuring monitoring alerts, you can detect and handle cluster anomalies in real time. Alibaba Cloud Elasticsearch supports two methods for this: one-click alert and custom alert rules in Cloud Monitor.
Enable one-click alert
The one-click alert feature, provided by Cloud Monitor, is disabled by default. When you enable it, the system automatically creates the following alert rules for all Elasticsearch clusters under your Alibaba Cloud account:
-
Abnormal cluster status
-
High node disk usage (>75%)
-
High node JVM heap usage (>85%)
-
Log on to the Alibaba Cloud Elasticsearch console.
-
In the left-side navigation pane, click Elasticsearch Clusters.
-
On the Elasticsearch Clusters page, click Initiative Alert.
-
In the Initiative Alert dialog box, click Enable Now.
NoteIf the button displays Disable Now, the one-click alert feature is already enabled, and you can skip the remaining steps.
-
On the Cloud Monitor console, turn on the Initiative Alert switch for the Elasticsearch service.
-
(Optional) Return to the Alibaba Cloud Elasticsearch console to verify that one-click alert is enabled.
-
On the Elasticsearch Clusters page, click the ID of the target instance.
-
In the left-side navigation pane, choose Monitoring and Logs > Cluster Monitoring .
-
Click the Basic Monitoring tab and check the status of Initiative Alert in the upper-right corner.
If the status of Initiative Alert is Enabled, the feature is active.
-
Configure Cloud Monitor alerts
The one-click alert feature uses a fixed template for its rules. To customize metrics, thresholds, and notification methods, you can create a custom alert rule in Cloud Monitor.
-
Go to the Cloud Monitor console.
-
In the left-side navigation pane, choose Alerts > Alarm Rules.
-
Click Create Alert Rule.
-
On the Create Alert Rule page, configure the alert rule.
The following example shows how to configure an alert rule for three combined metrics: cluster status, node disk usage, and node heap memory usage. For parameters not mentioned, use the default values. For detailed parameter descriptions, see Create an alert rule.
Parameter
Description
Product
Select Elasticsearch .
Resource Range
Select Cluster .
Associated Resources
Add the instances that you want to monitor.
Rule Description
Click Add Rule > Combined Metrics, and then configure the following parameters in the Configure Rule Description panel:
-
Metric Type: Select Combined Metrics.
-
Alert Level: Select Warning.
-
Multi-metric Alert Condition:
Note
This example configures three monitoring metrics. Click Add Metric to add more metric conditions.
-
Metric 1: Select and set the condition to >=2.
-
Metric 2: Select and set the condition to average >=75%.
-
Metric 3: Select and set the condition to average >=85%.
-
-
Relationship Between Metrics: Select Generate alerts if one of the conditions is met (||).
-
Alert Threshold Triggers: Select 3 Consecutive Cycles (1 Cycle = 1 Minute).
You can also configure a disk usage alert by using a single-metric alert rule. For more information, see Example: Configure a disk alert.
Alarm Contact Group
Select an existing alert contact group. If you have not created one, see Create an alert contact or an alert contact group.
Parameter
Description
Metric Type
Select Combined Metrics.
Alert Level
Select Warning.
Multi-metric Alert Condition
Click Add Metric to add a new metric and configure the following three monitoring metrics:
-
Metric 1: Select Cluster ID > Cluster Status, and set the threshold to >= 2.
-
Metric 2: Select nodeName > Node Disk Usage, and set the average value to >= 75%.
-
Metric 3: Select nodeName > Node Heap Memory Usage_ES Business, and set the average value to be greater than or equal to 85%.
Relationship Between Metrics
Select Generate alerts if one of the conditions is met (||).
Alert Threshold Triggers
Select 3 consecutive periods (1 period = 1 minute).
To configure a single-metric alert rule (for example, a disk usage alert), see Example: Configure a disk alert.
Expand Advanced Settings and enter a publicly accessible URL in the Alert Callback field. Cloud Monitor pushes alert information to this URL using POST requests. Only the HTTP protocol is supported. For more information, see Use alert callbacks for threshold-triggered alerts.
When configuring alert rules, you can refer to the following monitoring metrics. For more information, see Metric descriptions and troubleshooting suggestions.
Metric
Necessity
Recommended threshold
Description
cluster status
Required
Value >= 2
The cluster statuses Green, Yellow, and Red correspond to the numerical values 0.00, 1.00, and 2.00, respectively. Configure the alert metric based on these numerical values.
node disk usage (%)
Required
Average >= 75%
Should not exceed 80%.
node heap memory usage (%)
Required
Average >= 85%
Should not exceed 90%. In the rule description, this metric is displayed as Node Heap Memory Usage_ES Business.
node CPU utilization (%)
Optional
Average >= 95%
-
node load_1m
Optional
Use 80% of the number of CPU cores as a reference value.
-
cluster query QPS (count/second)
Optional
Use actual test results as a reference.
-
cluster write QPS (count/second)
Optional
Use actual test results as a reference.
-
full GC count (count)
Optional
Abnormal if the value is not 0.
-
exception count (count)
Optional
Abnormal if the value is not 0.
-
snapshot status
Optional
Abnormal if the value is 2.
Normal if the value is -1 or 0.
-
Click OK.
After the alert rule is created, members of the specified alert contact group receive notifications when an alert is triggered. For information about how to configure notification methods, see Receive alert notifications in a DingTalk group.
Example: Configure a disk alert
A disk usage alert is one of the most common single-metric alert scenarios. When node disk usage exceeds the configured threshold, you must promptly expand the storage capacity or clear data to prevent service unavailability due to full disks.
Follow the steps in Configure Cloud Monitor alerts to create an alert rule. In the Rule Description section, choose Add Rule >Simple Metric. The following table provides an example configuration.
Parameter | Example |
Alert Rule Name | Disk Usage Alert |
Metric Type | Select Simple Metric. |
Metrics | Select nodeName > Node Disk Usage. |
Threshold and Alert Level |
|
Chart Preview | A preview of the monitoring chart for the selected metric. |