[Discontinued] Use Cloud Monitor to monitor basic cluster resources

更新时间:
复制 MD 格式

Resource monitoring is one of the most common monitoring methods for Kubernetes. You can use the Kubernetes monitoring feature of Cloud Monitor to track the usage and health of basic resources, such as the CPU, memory, and network of workloads in Container Service for Kubernetes (ACK) clusters. This helps ensure cluster stability.

Important

The Kubernetes container monitoring feature of Cloud Monitor is being phased out. For more information, see Notice on Changes to the Kubernetes Container Monitoring Feature of Cloud Monitor. We recommend that you use Managed Service for Prometheus as an alternative. Managed Service for Prometheus provides the same features as Cloud Monitor.

Features

Cloud Monitor automatically discovers all ACK clusters within your Alibaba Cloud account. This provides centralized, cross-region monitoring of your container services. For more information, see Overview.

  • Provides cluster-wide metrics.

    You can view key metrics such as alerts, node counts, pod CPU and memory usage, and node CPU and memory utilization to quickly assess your cluster's health.

  • Offers professional monitoring and alerting.

    This feature is an upgrade to the previous container monitoring capabilities in Cloud Monitor, providing professional-grade basic monitoring for containerized environments. It offers key metrics from native Kubernetes perspectives, including namespace, node, workload, and pod. The alerting feature is also enhanced, allowing you to create alert rules from these different perspectives.

  • Uses metrics tailored for container scenarios.

    The feature uses the most appropriate metrics for different layers, including the host infrastructure, the container Platform-as-a-Service (PaaS) layer, and the Kubernetes scheduling layer. For example, to monitor memory that affects Kubernetes scheduling, it uses a dedicated container working memory metric, which is distinct from the host's memory usage.

Prerequisites

Enable Kubernetes monitoring in Cloud Monitor

To learn how to enable the Kubernetes monitoring feature, see Enable the Cloud Monitor feature for an ACK cluster.

View resource monitoring data

  1. Log on to the Cloud Monitor console.

  2. In the left-side navigation pane, click Container Service Monitoring.

  3. On the Container Service Monitoring page, click the name of the target cluster or click View Details in the Actions column.

    Note

    If you are accessing this page for the first time, a dialog box appears requesting authorization. Click Authorization to open the cluster details page.

  4. On the cluster details page, you can view monitoring data from different perspectives, such as Cluster overview, Nodes, Namespace, Workloads, and Alert Rule.

    For more information about this page, see View container monitoring data.

    The Cluster Overview page displays basic cluster information such as name, type, version, and VPC. It also shows the current alert status, a donut chart of the node running status, and line charts for the top 5 pods by CPU and memory usage.

Alerting scenarios

Scenario

Description

Alert configuration

Monitor the health of a cluster or its nodes based on resource usage.

When a cluster or a node in the cluster experiences abnormal resource usage, an alert is triggered, helping you prevent business disruptions. For this scenario, configure an alert rule for the cluster or its nodes.

When you create the alert rule, set Resource Scope to Cluster or Nodes. If you select Nodes and set it to All nodes, an alert is triggered if any node in the cluster meets the rule conditions.

Monitor abnormal resource usage for any pod in a cluster.

When a cluster experiences high resource usage, you often need to isolate the problem by identifying the specific pod. For this scenario, configure an alert rule that applies to any pod in the cluster.

When you create the alert rule, set Resource Scope to Container Group (pod), and select All for both the namespace and the pod. An alert is triggered if any pod in the cluster meets the rule conditions.

Set alerts for pods in a specific namespace of a multi-tenant cluster.

A common practice in Kubernetes is to use namespaces to isolate applications in a shared cluster. You can use alerts to detect abnormal resource usage in an application's namespace. For this scenario, configure an alert rule for any pod in a specific namespace.

When you create the alert rule, set Resource Scope to Container Group (pod). Select the application's Namespace, and then select All for Container Group (pod). An alert is triggered if any pod in the namespace meets the rule conditions.

Set alerts for pods in a specific workload and namespace.

Another common multi-tenancy pattern is to map an application to a workload, such as a Deployment. You can use alerts to detect abnormal resource usage in a specific Deployment. For this scenario, configure an alert rule for any pod in a specific workload.

When you create the alert rule, set Resource Scope to Container Group (pod). Select the application's Namespace and workload type. Supported workload types are Deployment, StatefulSet, DaemonSet, Job, and CronJob. Then, select any pod from the Container Group (pod) list. An alert is triggered if any pod in the workload meets the rule conditions.

Configure an alert rule

Step 1: Create an alert contact and an alert contact group

  1. Log on to the Cloud Monitor console.

  2. In the left-side navigation pane, choose Alerts > Alert Contacts.

  3. Create an alert contact and add it to an alert contact group.

Step 2: Create an alert rule

  1. Log on to the Cloud Monitor console.

  2. In the left-side navigation pane, click Container Service Monitoring.

  3. On the Container Service Monitoring page, find the target cluster and click Alert Rule in the Actions column.

  4. On the Alert Rule page, click Alert Rule Creation.

  5. In the Alert Rule Creation panel, set the following parameters.

    Parameter

    Description

    Resource Scope

    The scope of resources monitored by the alert rule. Valid values:

    • Cluster: The alert rule applies to the target cluster. You must select a cluster name.

    • Node: The alert rule applies to all or specified nodes in the target cluster. You must select a cluster and its nodes.

    • Container Group (pod): The alert rule applies to all or specific container groups of an application in a specified namespace of the target cluster. You need to first select the cluster and its namespace, and then select the Application and Container Group (pod) from the Deployment, StatefulSet, DaemonSet, Job, or CronJob tab.

      Note

      On the Container Group tab, you only need to select one or more Container Group (pods).

    Rule Description

    The condition that triggers the alert. An alert is sent when monitoring data meets this condition.

    Configure the metric, threshold, and alert level. For more information about pod metrics, see Container Service for Kubernetes (ACK) (new version).

    Mute Period

    • The interval for resending an alert notification for an unresolved issue if the alert level does not change. Valid values: 5 minutes, 15 minutes, 30 minutes, 60 minutes, 3 hours, 6 hours, 12 hours, and 24 hours.

    • When a monitored metric reaches an alert threshold, an alert notification is sent. During the Mute Period, no new notifications are sent if the alert level does not change. A new notification is sent only if the alert level changes (including returning to normal) or if the Mute Period expires.

      Note

      The Alert History page distinguishes between two silencing scenarios: the Mute Period for a single-resource alert, and a broader suppression state that affects alerts from multiple different resources.

    Effective Period

    The time range during which the alert rule is active. Cloud Monitor checks monitoring data against the rule only within this period.

    Alert Callback

    A publicly accessible HTTP URL to receive alert information from Cloud Monitor via POST requests. For more information about how to set up an alert callback, see Use alert callbacks for threshold-based alerts.

    Note

    We recommend that you use a publicly accessible URL.

    Alert Contact Group

    The alert contact group that receives alert notifications.

    Alert notifications are sent to the alert contacts within the selected Alert Contact Group. An Alert Contact Group can contain one or more alert contacts.

    For more information about how to create alert contacts and alert contact groups, see Create an alert contact or an alert contact group.

  6. Click OK to save the alert rule configuration.

    The new alert rule appears on the Alert Rule page. For more information, see Manage alert rules.

Check the result

  1. In the left-side navigation pane, choose Alerts > Alert History.

  2. On the Alert History page, you can view the alert trend and a record of triggered alerts.

Legacy resource monitoring

If your ACK cluster's metrics-server component is older than version 0.3.8.5, you can access the legacy resource monitoring page by following these steps.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Deployments.

  3. Find the desired Deployment and click Monitoring in the Actions column to open the corresponding Dashboards page in Cloud Monitor.

  4. You can view monitoring data on the Deploy, Pods, and Container group hotspot tabs.

  5. Optional: To set an alert, choose Alerts > Alert Rule in the left-side navigation pane.

    Group-level metrics start with group, and instance-level metrics start with pod.

FAQ

Why is my Kubernetes cluster data missing?

If no container monitoring data is displayed, see What do I do if no data is available for an ACK cluster in Cloud Monitor? for troubleshooting.