Overview-ApsaraDB for OceanBase (Deprecated)(OceanBase)-阿里云帮助中心

ApsaraDB for OceanBase provides an alerting feature that supports alerts for various dimensions, such as OceanBase clusters, data assessment, data transmission, and data development. You can use the built-in alert metrics to meet basic alerting requirements. This topic describes the details of each alert.

Alert information

Each alert page contains the following information:

Name	Description
Alert description	Describes the meaning of each alert and the scenarios that trigger it.
Rule information	Describes the rules that trigger each alert, including Monitoring Metric, Metric Meaning, Recommended Threshold, Duration, and Detection Period. Trigger rule: The system checks the monitoring metric once every detection period. An alert is reported if the monitoring metric value exceeds the default threshold and this state persists for the specified duration.
Impact on the system	Describes the potential impact on the system when an alert occurs.
Possible causes	Describes the causes of the alert to help you locate the problem and handle the alert.
Solution	Follow the specific method provided for each alert. For more information, see Add an alert rule.

Note

For more information about adding alert rules, see Add an alert rule.

Concepts

Alert object

An alert object is the entity monitored by an alert task. It uniquely identifies the object of an alert. An alert object can be an OceanBase cluster, a machine, or a service.

The format of an alert object is the alert rule name and the faulty instance, such as disk_log_usage_instance (Instance: integration_22-ob2).

Alert scope

The alert scope defines the range of an alert and corresponds to the metric scope.

The alert scope includes OceanBase Cluster (OBCluster), data assessment, data transmission, and data development.

Rule description

ApsaraDB for OceanBase lets you configure alert rules for tenant monitoring data details and node monitoring data details. The resource scope and monitoring metrics for each rule are listed below. You can configure them in Monitoring and Alerts as required. We recommend that you follow our best practices.

The monitoring metrics for configuring alerts for tenant metrics are as follows:

Metric	Metric Name	Corresponding Alert Metric
Memory usage	memory_usage	Tenant / Tenant Memory Usage
CPU usage	cpu_usage_percent	Tenant / CPU Usage
Disk usage	disk_ob_data_size	Cluster / Maximum Disk Usage Note: Because storage usage is not isolated between tenants, you can only configure disk usage at the cluster level.
Total connections	total_sessions	Configuring alert policies is not supported.
Read/write connections	readwrite_sessions	Configuring alert policies is not supported.
Read-only connections	readonly_sessions	Configuring alert policies is not supported.
Write requests	tps	Tenant / Write Requests
Read requests	QPS	Tenant / Read Requests
Write request response time	tps_rt	Tenant / Write Request Response Time
Read request response time	qps_rt	Tenant / Read Request Response Time
Wait queue	request_queue_rt	Tenant / Wait Queue
Transaction commits	trans_user_trans_count	Tenant / Transaction Commits
Transaction response time	trans_commit_rt	Tenant / Transaction Commit Response Time

The monitoring metrics for configuring alerts for node metrics are as follows:

Monitoring metrics	Metric Name	Corresponding Alert Metric
CPU usage	cpu_util	Node / CPU Usage
Load	load_load1	Node / Load
Machine memory usage	machine_mem_used_percent	Node / Machine Memory Usage
Disk read	io_read_bytes	Node / Disk Read
Disk write	io_write_bytes	Node / Disk Write
Disk I/O wait	io_await	Node / Disk I/O Wait
Inbound packet rate	traffic_bytin	Node / Inbound Packet Rate
Outbound packet rate	traffic_bytout	Node / Outbound Packet Rate
Retransmission rate	tcp_retran	Node / Retransmission Rate
Total connections	total_sessions	Configuring alert policies is not supported.
Read/write connections	readwrite_sessions	Configuring alert policies is not supported.
Read-only connections	readonly_sessions	Configuring alert policies is not supported.

Alert levels

Each alert metric has a corresponding alert level.

Level	English Meaning	Chinese Meaning	Notification Method	Description
1	Critical	Critical	Phone call + Text message + Email + DingTalk Robot	System availability has decreased and requires immediate repair to prevent a complete outage. Alternatively, the system is still available but is about to become unavailable. Take action to prevent further loss of availability. For example, the machine memory usage is greater than 90% for 3 minutes.
2	Warning	Warning	Text message + Email + DingTalk Robot	Key system performance metrics are declining but have not yet reached the warning threshold. Investigate to find potential problems and prevent a warning. (This is a reserved type. No alert metrics currently match this level.)
3	Info	Standard	Email + DingTalk Robot	This is an operational reminder, not a true alert. It is typically triggered when an administrator performs an important operation, such as taking a cluster offline. When an alert at this level is resolved, no alert recovery notification is sent.

Level

English Meaning

Chinese Meaning

Notification Method

Description

Critical

Phone call + Text message + Email + DingTalk Robot

System availability has decreased and requires immediate repair to prevent a complete outage. Alternatively, the system is still available but is about to become unavailable. Take action to prevent further loss of availability.

For example, the machine memory usage is greater than 90% for 3 minutes.

Warning

Text message + Email + DingTalk Robot

Key system performance metrics are declining but have not yet reached the warning threshold. Investigate to find potential problems and prevent a warning. (This is a reserved type. No alert metrics currently match this level.)

Info

Standard

Email + DingTalk Robot

This is an operational reminder, not a true alert. It is typically triggered when an administrator performs an important operation, such as taking a cluster offline.

When an alert at this level is resolved, no alert recovery notification is sent.