Kube-scheduler monitoring metrics and dashboard-Container Service for Kubernetes(ACK)-阿里云帮助中心

The kube-scheduler component is the default scheduler in a Kubernetes cluster, responsible for assigning Pods to suitable nodes. This topic describes the monitoring metrics for the kube-scheduler component, provides guidance on using its dashboard, and explains how to resolve common metric anomalies.

Before you begin

Access the dashboard

See View Monitoring Dashboards for Control Plane Components.

List of metrics

Metrics expose the status and parameters of a component. The following table lists the metrics for the kube-scheduler component.

Metric	Type	Description
scheduler_scheduler_cache_size	Gauge	The number of nodes, Pods, and AssumedPods (Pods assumed to be scheduled) in the scheduler cache.
scheduler_pending_pods	Gauge	The number of pending Pods. Pods are categorized by queue type: unschedulable: The number of Pods that cannot be scheduled. backoff: The number of Pods in the backoffQ. These Pods are temporarily unschedulable. active: The number of Pods in the activeQ. These Pods are ready for scheduling.
scheduler_pod_scheduling_attempts_bucket	Histogram	The number of attempts required to schedule a Pod. The bucket thresholds are `{1, 2, 4, 8, 16}`.
memory_utilization_byte	Gauge	The memory usage, in bytes.
cpu_utilization_core	Gauge	The CPU usage, in cores.
resource_utilization_level	Gauge	Resource utilization level. resource: The resource type. Valid values: `cpu` and `memory`. utilization_level: The utilization level. Valid values: `high` (utilization ≥ 80%) and `normal` (utilization < 80%). container: The target container. Valid values: `kube-apiserver`, `kube-scheduler`, `kube-controller-manager`, `cloud-controller-manager`, and `etcd`.
rest_client_requests_total	Counter	The number of HTTP requests, categorized by status code, method, and host.
rest_client_request_duration_seconds_bucket	Histogram	The HTTP request latency, categorized by verb and URL.

Note

The following resource utilization metrics are no longer in use. Remove any alerts or monitoring rules that rely on these metrics:

cpu_utilization_ratio: CPU utilization.
memory_utilization_ratio: Memory utilization.

Dashboard

The dashboard visualizes component metrics by using PromQL queries. The following sections describe the data visualizations and their corresponding metrics.

Overview

Visualization

Metric details

Metric	PromQL	Description
Scheduler pending Pods	scheduler_pending_pods{job="ack-scheduler"}	The number of pending Pods. Pods are categorized by queue type: unschedulable: The number of Pods that cannot be scheduled. backoff: The number of Pods in the backoffQ. These Pods are temporarily unschedulable. active: The number of Pods in the activeQ. These Pods are ready for scheduling.
Scheduler Pod scheduling attempts	histogram_quantile($quantile, sum(rate(scheduler_pod_scheduling_attempts_bucket{job="ack-scheduler"}[$interval])) by (pod, le))	The number of attempts required to schedule a Pod. The bucket thresholds are `{1, 2, 4, 8, 16}`.
Scheduler cache statistics	scheduler_scheduler_cache_size{job="ack-scheduler",type="nodes"} scheduler_scheduler_cache_size{job="ack-scheduler",type="pods"} scheduler_scheduler_cache_size{job="ack-scheduler",type="assumed_pods"}	The number of nodes, Pods, and AssumedPods in the scheduler cache.

Resources

Visualization

Metric details

Metric	PromQL	Description
Memory usage	memory_utilization_byte{container="kube-scheduler"}	The memory usage, in bytes.
CPU Usage	cpu_utilization_core{container="kube-scheduler"}*1000	The CPU usage, in millicores.
Memory resource utilization level	resource_utilization_level{resource="memory",container="kube-scheduler",utilization_level="high"} resource_utilization_level{resource="memory",container="kube-scheduler",utilization_level="normal"}	If `resource_utilization_level{utilization_level="high",...}` is 1, the container resource utilization level is ≥ 80%. If `resource_utilization_level{utilization_level="normal",...}` is 1, the container resource utilization level is < 80%.
CPU resource utilization level	resource_utilization_level{resource="cpu",container="kube-scheduler",utilization_level="high"} resource_utilization_level{resource="cpu",container="kube-scheduler",utilization_level="normal"}

Kube API

Visualization

Metric details

Metric	PromQL	Description
Kube API request QPS	sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"2.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"3.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"4.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"5.."}[$interval])) by (method,code)	The rate of HTTP requests from kube-scheduler to the kube-apiserver component, categorized by method and status code.
Kube API request latency	histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"}[$interval])) by (verb,url,le))	The latency of HTTP requests from kube-scheduler to the kube-apiserver component, categorized by verb and request URL.

Common metric anomalies

If you observe metric anomalies, use the following descriptions to check if the behavior is expected.

Number of live scheduler Pods

Normal condition	Abnormal condition	Description	Recommendation
The number of live scheduler Pods is greater than or equal to 1.	The number of live scheduler Pods is 0.	No schedulers are available in the cluster.	Check for a scheduler-related Deployment or StatefulSet. Determine if the scheduler Pod was manually taken offline.

Number of pending Pods

Normal condition	Abnormal condition	Description	Recommendation
The Pod scheduling rate is stable and the number of pending Pods remains low.	The number of Pods in the unschedulable queue continuously increases. The number of Pods in the unschedulable queue does not decrease even after other Pods are scheduled.	The resource requests for Pods in the cluster are inappropriate, or the node resources are insufficient.	Check whether node resources meet the Pod's requirements. Check whether the Pod specifies a node affinity that cannot be satisfied.

Pod scheduling attempts

Normal condition	Abnormal condition	Description	Recommendation
A Pod is scheduled to a node after a few attempts.	A Pod still cannot be scheduled after multiple attempts.	The resource requests for Pods in the cluster are inappropriate, or the node resources are insufficient.	Check whether node resources meet the Pod's requirements. Check whether the Pod specifies a node affinity that cannot be satisfied.

Before you begin

Access the dashboard

List of metrics

Dashboard

Overview

Visualization

Metric details

Resources

Visualization

Metric details

Kube API

Visualization

Metric details

Common metric anomalies

Number of live scheduler Pods

Number of pending Pods

Pod scheduling attempts

Related topics