集群巡检项说明

更新时间: 2022-07-14 20:38:55

ADP底座在安装过程中,会对整个集群进行巡检,如果巡检出问题,会报安装错误。因此我们需要了解这些巡检项,以便更好解决问题。

巡检项说明

集群整体巡检项

巡检项

描述

check-docker-overlay-mount

  • 检查docker overlay文件系统是否可以正常mount

check-k8s-apiserver-crash

  • 检查k8s Apiserver是否有crash

check-k8s-cs

  • 检查k8s核心组件管控pod是否健康

check-kubelet-evict

  • 检查环境中是否有被驱逐的pod,检查kubelet是否有pressure

check-kube-proxy-pod

  • 检查kube proxy的管控pod是否健康

check-k8s-namespace

  • 检查k8s namespace是否正常

check-k8s-node

  • 检查k8s node是否正常

check-k8s-pod

  • 检查k8s pod是否正常

check-network-control-plane

  • 检查网络插件管控是否健康

check-node-network

  • 检查节点网络内的联通性

check-pod-network

  • 检查容器网络到节点网络的联通性

  • 检查容器网络内的联通性

  • 检查容器到Service ClusterIP的联通性

  • 检查容器到Service NodePort的联通性

  • 检查容器到DNS Domain的联通性

  • 检查容器到DNS ClusterIP的联通性

  • 检查容器到DNS Endpoint的联通性

check-k8s-dns-hostnet

  • 检查节点到DNS Domain的联通性

  • 检查节点到DNS Domain的联通性

  • 检查节点到DNS Endpoint的联通性

k8s核心组件巡检项

组件名称

巡检项名称

巡检项级别

etcd

etcdMembersDown

critical

etcd

etcdInsufficientMembers

critical

etcd

etcdNoLeader

critical

etcd

etcdHighNumberOfLeaderChanges

warning

etcd

etcdHighNumberOfLeaderChanges

warning

etcd

etcdHighNumberOfFailedGRPCRequests

warning

etcd

etcdHighNumberOfFailedGRPCRequests

critical

etcd

etcdGRPCRequestsSlow

critical

etcd

etcdMemberCommunicationSlow

warning

etcd

etcdHighNumberOfFailedProposals

warning

etcd

etcdHighFsyncDurations

warning

etcd

etcdHighCommitDurations

warning

etcd

etcdHighNumberOfFailedHTTPRequests

warning

etcd

etcdHighNumberOfFailedHTTPRequests

critical

etcd

etcdHTTPRequestsSlow

warning

kube-apiserver-slos

KubeAPIErrorBudgetBurn

critical

kube-apiserver-slos

KubeAPIErrorBudgetBurn

critical

kube-apiserver-slos

KubeAPIErrorBudgetBurn

warning

kube-apiserver-slos

KubeAPIErrorBudgetBurn

warning

kube-state-metrics

KubeStateMetricsListErrors

critical

kube-state-metrics

KubeStateMetricsWatchErrors

critical

kubernetes-apps

KubePodCrashLooping

warning

kubernetes-apps

KubePodNotReady

warning

kubernetes-apps

KubeDeploymentGenerationMismatch

warning

kubernetes-apps

KubeDeploymentReplicasMismatch

warning

kubernetes-apps

KubeStatefulSetReplicasMismatch

warning

kubernetes-apps

KubeStatefulSetGenerationMismatch

warning

kubernetes-apps

KubeStatefulSetUpdateNotRolledOut

warning

kubernetes-apps

KubeDaemonSetRolloutStuck

warning

kubernetes-apps

KubeContainerWaiting

warning

kubernetes-apps

KubeDaemonSetNotScheduled

warning

kubernetes-apps

KubeDaemonSetMisScheduled

warning

kubernetes-apps

KubeJobCompletion

warning

kubernetes-apps

KubeJobFailed

warning

kubernetes-apps

KubeHpaReplicasMismatch

warning

kubernetes-apps

KubeHpaMaxedOut

warning

kubernetes-resources

KubeCPUOvercommit

warning

kubernetes-resources

KubeMemoryOvercommit

warning

kubernetes-resources

KubeCPUQuotaOvercommit

warning

kubernetes-resources

KubeMemoryQuotaOvercommit

warning

kubernetes-resources

KubeQuotaAlmostFull

info

kubernetes-resources

KubeQuotaFullyUsed

info

kubernetes-resources

KubeQuotaExceeded

warning

kubernetes-resources

CPUThrottlingHigh

info

kubernetes-storage

KubePersistentVolumeFillingUp

warning

kubernetes-storage

KubePersistentVolumeFillingUp

critical

kubernetes-storage

KubePersistentVolumeFillingUp

warning

kubernetes-storage

KubePersistentVolumeErrors

critical

kubernetes-system

KubeVersionMismatch

warning

kubernetes-system

KubeClientErrors

warning

kubernetes-system-apiserver

KubeClientCertificateExpiration

warning

kubernetes-system-apiserver

KubeClientCertificateExpiration

critical

kubernetes-system-apiserver

AggregatedAPIErrors

warning

kubernetes-system-apiserver

AggregatedAPIDown

warning

kubernetes-system-apiserver

KubeAPIDown

critical

kubernetes-system-controller-manager

KubeControllerManagerDown

critical

kubernetes-system-kubelet

KubeNodeNotReady

warning

kubernetes-system-kubelet

KubeNodeUnreachable

warning

kubernetes-system-kubelet

KubeletTooManyPods

warning

kubernetes-system-kubelet

KubeNodeReadinessFlapping

warning

kubernetes-system-kubelet

KubeletPlegDurationHigh

warning

kubernetes-system-kubelet

KubeletPodStartUpLatencyHigh

warning

kubernetes-system-kubelet

KubeletClientCertificateExpiration

warning

kubernetes-system-kubelet

KubeletClientCertificateExpiration

critical

kubernetes-system-kubelet

KubeletServerCertificateExpiration

warning

kubernetes-system-kubelet

KubeletServerCertificateExpiration

critical

kubernetes-system-kubelet

KubeletClientCertificateRenewalErrors

warning

kubernetes-system-kubelet

KubeletServerCertificateRenewalErrors

warning

kubernetes-system-kubelet

KubeletDown

critical

kubernetes-system-scheduler

KubeSchedulerDown

critical

node-exporter

NodeFilesystemSpaceFillingUp

warning

node-exporter

NodeFilesystemSpaceFillingUp

critical

node-exporter

NodeFilesystemAlmostOutOfSpace

warning

node-exporter

NodeFilesystemAlmostOutOfSpace

critical

node-exporter

NodeFilesystemFilesFillingUp

warning

node-exporter

NodeFilesystemFilesFillingUp

critical

node-exporter

NodeFilesystemAlmostOutOfFiles

warning

node-exporter

NodeFilesystemAlmostOutOfFiles

critical

node-exporter

NodeNetworkReceiveErrs

warning

node-exporter

NodeNetworkTransmitErrs

warning

node-exporter

NodeHighNumberConntrackEntriesUsed

warning

node-exporter

NodeTextFileCollectorScrapeError

warning

node-exporter

NodeClockSkewDetected

warning

node-exporter

NodeClockNotSynchronising

warning

node-exporter

NodeRAIDDegraded

critical

node-exporter

NodeRAIDDiskFailure

warning

node-network

NodeNetworkInterfaceFlapping

warning

上一篇: 集群监控告警说明 下一篇: 集群资源
阿里云首页 云原生应用交付平台 相关技术圈