Kube-controller-manager metrics and dashboard

更新时间:
复制 MD 格式

View kube-controller-manager workqueue, resource, and Kube API metrics with the monitoring dashboard.

Key concepts

Workqueue

Workqueue terminology

Controllers managed by kube-controller-manager, such as the Node Controller, StatefulSet Controller, and Deployment Controller, use a Workqueue to process resource updates. When an event occurs, such as Pod creation, update, or deletion, the controller places the resource identifier (for example, Pod name and namespace) into the Workqueue. A work loop retrieves and processes these identifiers.

Before you begin

Access the dashboard

See View monitoring dashboards for cluster control plane components.

Metric list

The following table lists kube-controller-manager metrics.

Metric

Type

Description

workqueue_adds_total

Counter

Total events added to the Workqueue.

workqueue_depth

Gauge

Current Workqueue depth. A sustained high value indicates the controller cannot process tasks fast enough, causing a backlog.

workqueue_queue_duration_seconds_bucket

Histogram

Time an item waits in the Workqueue. Bucket thresholds: {10-8, 10-7, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1, 1, 10}. Unit: seconds.

memory_utilization_byte

Gauge

Memory utilization. Unit: bytes.

cpu_utilization_core

Gauge

CPU utilization. Unit: cores.

resource_utilization_level

Gauge

Resource utilization level.

  • resource: The resource type. Valid values: cpu and memory.

  • utilization_level: The utilization level. Valid values: high (utilization ≥ 80%) and normal (utilization < 80%).

  • container: The target container. Valid values: kube-apiserver, kube-scheduler, kube-controller-manager, cloud-controller-manager, and etcd.

rest_client_requests_total

Counter

Total HTTP requests by status code, method, and host.

rest_client_request_duration_seconds_bucket

Histogram

HTTP request latency by verb and URL.

Note

The following resource utilization metrics are no longer in use. Remove any alerts or monitoring rules that rely on these metrics:

  • cpu_utilization_ratio: CPU utilization.

  • memory_utilization_ratio: Memory utilization.

Dashboard usage

Configure the request quantile and PromQL sampling interval on the dashboard. The following sections describe each chart and its PromQL query.

Workqueue

Dashboard viewkcm1

Charts

Name

PromQL

Description

Workqueue Add Rate

sum(rate(workqueue_adds_total{job="ack-kube-controller-manager"}[$interval])) by (name)

Rate of events added to the Workqueue.

Workqueue Depth

sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name)

Average rate of change in Workqueue depth.

Workqueue Processing Latency

histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le))

Time an item waits in the Workqueue.

Resources

Dashboard view

image

Charts

Chart name

PromQL

Description

Memory Utilization

memory_utilization_byte{container="kube-controller-manager"}

Memory utilization. Unit: bytes.

CPU Utilization

cpu_utilization_core{container="kube-controller-manager"}*1000

CPU utilization. Unit: millicore.

Memory Resource Utilization Level

  • resource_utilization_level{resource="memory",container="kube-controller-manager",utilization_level="high"}

  • resource_utilization_level{resource="memory",container="kube-controller-manager",utilization_level="normal"}

  • If resource_utilization_level{utilization_level="high",...} is 1, the container resource utilization level is ≥ 80%.

  • If resource_utilization_level{utilization_level="normal",...} is 1, the container resource utilization level is < 80%.

CPU Resource Utilization Level

  • resource_utilization_level{resource="cpu",container="kube-controller-manager",utilization_level="high"}

  • resource_utilization_level{resource="cpu",container="kube-controller-manager",utilization_level="normal"}

Kube API

Dashboard viewkcm3

Charts

Chart name

PromQL

Description

Kube API Request QPS

  • sum(rate(rest_client_requests_total{job="ack-kube-controller-manager",code=~"2.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-kube-controller-manager",code=~"3.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-kube-controller-manager",code=~"4.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-kube-controller-manager",code=~"5.."}[$interval])) by (method,code)

QPS of HTTP requests from kube-controller-manager to kube-apiserver, by method and status code.

Kube API Request Latency

histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-kube-controller-manager"}[$interval])) by (verb,url,le))

Latency of HTTP requests from kube-controller-manager to kube-apiserver, by verb and URL.

References

For metrics and dashboards of other control plane components, see Monitoring metrics for the kube-apiserver component, Monitoring metrics for the etcd component, Monitoring metrics for the kube-scheduler component, and Monitoring metrics for the cloud-controller-manager component.