Overview

更新时间:
复制 MD 格式

ApsaraDB for OceanBase provides an alerting feature that supports alerts for various dimensions, such as OceanBase clusters, data assessment, data transmission, and data development. You can use the built-in alert metrics to meet basic alerting requirements. This topic describes the details of each alert.

Alert information

Each alert page contains the following information:

Name

Description

Alert description

Describes the meaning of each alert and the scenarios that trigger it.

Rule information

Describes the rules that trigger each alert, including Monitoring Metric, Metric Meaning, Recommended Threshold, Duration, and Detection Period.

Trigger rule: The system checks the monitoring metric once every detection period. An alert is reported if the monitoring metric value exceeds the default threshold and this state persists for the specified duration.

Impact on the system

Describes the potential impact on the system when an alert occurs.

Possible causes

Describes the causes of the alert to help you locate the problem and handle the alert.

Solution

Follow the specific method provided for each alert.

For more information, see Add an alert rule.

Note

For more information about adding alert rules, see Add an alert rule.

Concepts

Alert object

An alert object is the entity monitored by an alert task. It uniquely identifies the object of an alert. An alert object can be an OceanBase cluster, a machine, or a service.

The format of an alert object is the alert rule name and the faulty instance, such as disk_log_usage_instance (Instance: integration_22-ob2).

Alert scope

The alert scope defines the range of an alert and corresponds to the metric scope.

The alert scope includes OceanBase Cluster (OBCluster), data assessment, data transmission, and data development.

Rule description

ApsaraDB for OceanBase lets you configure alert rules for tenant monitoring data details and node monitoring data details. The resource scope and monitoring metrics for each rule are listed below. You can configure them in Monitoring and Alerts as required. We recommend that you follow our best practices.

The monitoring metrics for configuring alerts for tenant metrics are as follows:

Metric

Metric Name

Corresponding Alert Metric

Memory usage

memory_usage

Tenant / Tenant Memory Usage

CPU usage

cpu_usage_percent

Tenant / CPU Usage

Disk usage

disk_ob_data_size

Cluster / Maximum Disk Usage

Note: Because storage usage is not isolated between tenants, you can only configure disk usage at the cluster level.

Total connections

total_sessions

Configuring alert policies is not supported.

Read/write connections

readwrite_sessions

Configuring alert policies is not supported.

Read-only connections

readonly_sessions

Configuring alert policies is not supported.

Write requests

tps

Tenant / Write Requests

Read requests

QPS

Tenant / Read Requests

Write request response time

tps_rt

Tenant / Write Request Response Time

Read request response time

qps_rt

Tenant / Read Request Response Time

Wait queue

request_queue_rt

Tenant / Wait Queue

Transaction commits

trans_user_trans_count

Tenant / Transaction Commits

Transaction response time

trans_commit_rt

Tenant / Transaction Commit Response Time

The monitoring metrics for configuring alerts for node metrics are as follows:

Monitoring metrics

Metric Name

Corresponding Alert Metric

CPU usage

cpu_util

Node / CPU Usage

Load

load_load1

Node / Load

Machine memory usage

machine_mem_used_percent

Node / Machine Memory Usage

Disk read

io_read_bytes

Node / Disk Read

Disk write

io_write_bytes

Node / Disk Write

Disk I/O wait

io_await

Node / Disk I/O Wait

Inbound packet rate

traffic_bytin

Node / Inbound Packet Rate

Outbound packet rate

traffic_bytout

Node / Outbound Packet Rate

Retransmission rate

tcp_retran

Node / Retransmission Rate

Total connections

total_sessions

Configuring alert policies is not supported.

Read/write connections

readwrite_sessions

Configuring alert policies is not supported.

Read-only connections

readonly_sessions

Configuring alert policies is not supported.

Alert levels

Each alert metric has a corresponding alert level.

Level

English Meaning

Chinese Meaning

Notification Method

Description

1

Critical

Critical

Phone call + Text message + Email + DingTalk Robot

System availability has decreased and requires immediate repair to prevent a complete outage. Alternatively, the system is still available but is about to become unavailable. Take action to prevent further loss of availability.

For example, the machine memory usage is greater than 90% for 3 minutes.

2

Warning

Warning

Text message + Email + DingTalk Robot

Key system performance metrics are declining but have not yet reached the warning threshold. Investigate to find potential problems and prevent a warning. (This is a reserved type. No alert metrics currently match this level.)

3

Info

Standard

Email + DingTalk Robot

This is an operational reminder, not a true alert. It is typically triggered when an administrator performs an important operation, such as taking a cluster offline.

When an alert at this level is resolved, no alert recovery notification is sent.