Troubleshoot service issues

更新时间:
复制 MD 格式

Diagnose and resolve issues when Type=LoadBalancer services encounter CLB errors or access failures. See Service load balancing notes.

Prerequisites

CCM component version is V1.9.3.276-g372aa98-aliyun or later (upgrade instructions, release notes).

Diagnostic process

Identify the source of a LoadBalancer service issue.

  1. Identify the service associated with the CLB instance. Replace XXX.XXX.XXX.XXX with the load balancer IP address.

    kubectl get svc -A | grep -i LoadBalancer | grep {XXX.XXX.XXX.XXX}

    A healthy service shows output similar to:

    default   my-svc   LoadBalancer   10.x.x.x   XXX.XXX.XXX.XXX   80:32xxx/TCP   5d
  2. Run the following command to check whether the service has error events.

    kubectl -n {your-namespace} describe svc {your-svc-name}

    Check the Events section at the bottom. Error output example:

    Events:
      Type     Reason                  Age   From                Message
      ----     ------                  ---   ----                -------
      Warning  SyncLoadBalancerFailed  2m    service-controller  <error message here>

Service error events and solutions

Run kubectl -n {your-namespace} describe svc {your-svc-name} and match the error message in the Events section to the table below.

Error message Cause Solution
The backend server number has reached to the quota limit of this load balancers The CLB instance has reached the 200-backend-server quota limit. Do one of the following: 1. Request a quota increase on the SLB Quota Management page . 2. Set externalTrafficPolicy: Local to reduce backend count. With Cluster mode, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-backend-label annotation to limit backend nodes. 3. Create a new CLB instance.
The loadbalancer does not support backend servers of eni type Shared CLB instances do not support Elastic Network Interface (ENI) backends. Add the annotation service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: "slb.s1.small" to use a high-performance CLB instance. Verify CCM version compatibility. See Use annotations to configure a Classic Load Balancer (CLB) instance.
There are no available nodes for LoadBalancer The CLB instance has no backend servers. Check the pod status: <br>- If no pod matches the service, add one. <br>- If the pod is unhealthy, resolve the issue. See Troubleshoot pod issues. <br>- If the pod runs but is not a backend, check if it is on a master node and move it to a worker node.
alicloud: not able to find loadbalancer named [%s] in openapi, but it's defined in service.loaderbalancer.ingress... or alicloud: can not find loadbalancer, but it's defined in service The CLB instance referenced by the service cannot be found. Search for the CLB instance in the Server Load Balancer console using the service's EXTERNAL-IP. <br>- If the CLB no longer exists and the service is unneeded, delete it. <br>- If the CLB exists and was created manually, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation. See Use annotations to configure a Classic Load Balancer (CLB) instance. <br>- If the CLB was created by CCM, add the kubernetes.do.not.delete label to the CLB instance. See How do I rename an SLB instance if I am using an earlier version of CCM?.
ORDER.ARREARAGE Message: The account is arrearage. The account has an overdue payment. Settle the overdue payment.
PAY.INSUFFICIENT_BALANCE Message: Your account does not have enough balance. The account balance is insufficient.

Your account balance is less than CNY 100. Top up your account.

Top up the account balance.
Status Code: 400 Code: Throttlingxxx The CLB OpenAPI is being throttled. 1. Check your CLB quota on the SLB Quota Management page. <br>2. Check for service errors and resolve them: kubectl -n {your-namespace} describe svc {your-svc-name}.
Status Code: 400 Code: RspoolVipExist Message: there are vips associating with this vServer group. The listener linked to the vServer group cannot be deleted. 1. Check whether the service annotation contains a CLB ID: service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: {your-clb-id}. If present, the CLB is being reused. <br>2. In the CLB console, delete the listener for the port defined in the service. See Configure listener forwarding rules.
Status Code: 400 Code: NetworkConflict The internal-facing CLB instance is in a different Virtual Private Cloud (VPC) than the cluster. Move the CLB instance to the same VPC as the cluster, or create a new CLB instance in the correct VPC.
Status Code: 400 Code: VSwitchAvailableIpNotExist Message: The specified VSwitch has no available ip. The vSwitch has no available IP addresses. Add the annotation service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id: "${YOUR_VSWITCH_ID}" to specify a different vSwitch in the same VPC.

Message:The specified VSwitch does not exist.

The specified vSwitch does not exist.

Solution:

  • If the service uses the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id annotation, verify that the specified vSwitch exists.

  • If the service does not use the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id annotation, verify that the default vSwitch ID for the cluster exists. You can view theNode vSwitch on the Basic Information tab of the default node pool (default-nodepool). See Create and manage node pools.

    If the default vSwitch does not exist, use the annotation to specify a different vSwitch.

The specified Port must be between 1 and 65535. ENI mode does not support string values for targetPort. Change targetPort to an integer in the service YAML, or upgrade CCM. See Upgrade the CCM component.
Status Code: 400 Code: ShareSlbHaltSales Message: The share instance has been discontinued. Older CCM versions create shared CLB instances by default, which are now discontinued. Upgrade the CCM component.
can not change ResourceGroupId once created The CLB resource group cannot be changed after instance creation. Remove the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-resource-group-id:"rg-xxxx" annotation from the service.
can not find eniid for ip x.x.x.x in vpc vpc-xxxx ENI IP not found in the VPC. service.beta.kubernetes.io/backend-type: eni is set but the cluster uses Flannel, which does not support ENI mode. Remove the service.beta.kubernetes.io/backend-type: eni annotation from the service.
The operation is not allowed because the instanceChargeType of loadbalancer is PayByCLCU. or User does not have permission modify InstanceChargeType to spec. The CLB billing method cannot change from pay-as-you-go (PayByCLCU) to pay-by-specification. Do one of the following: <br>- Remove the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec annotation. <br>- If the service has the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type annotation, set its value to PayByCLCU.
SyncLoadBalancerFailed the loadbalancer xxx can not be reused, can not reuse loadbalancer created by kubernetes. The CLB instance was created by CCM and cannot be reused via the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation. 1. Find the CLB ID in the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation of the service YAML. <br>2. Resolve based on service status: <br>&nbsp;&nbsp;- Service is pending: Replace the CLB ID with one created manually in the Classic Load Balancer (CLB) console. <br>&nbsp;&nbsp;- Service is not pending, CLB IP matches the service EXTERNAL-IP: Delete the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation. <br>&nbsp;&nbsp;- Service is not pending, CLB IP does not match: Find the CLB matching the service EXTERNAL-IP in the console and update the annotation. If no match, use a manually created CLB ID and recreate the service.
alicloud: can not change LoadBalancer AddressType once created. delete and retry The CLB instance type cannot be changed after creation. Delete the service and recreate it.
the loadbalancer lb-xxxxx can not be reused, service has been associated with ip [xxx.xxx.xxx.xxx], cannot be bound to ip [xxx.xxx.xxx.xxx] The service is bound to a CLB instance and cannot be rebound by changing the annotation. Delete the service and recreate it with the correct CLB instance ID.

Troubleshooting methods

For issues that do not produce error events, use the following symptom-based guide.

Issue Symptom Solution
CLB access issues Uneven load distribution across backends Uneven load distribution across CLB backends
503 error during application updates 503 error during application updates
CLB inaccessible from within the cluster CLB inaccessible from within the cluster
CLB inaccessible from outside the cluster CLB inaccessible from outside the cluster
"The plain HTTP request was sent to HTTPS port" error Cannot connect to the backend HTTPS service
CLB configuration issues Service annotations do not take effect What do I do if service annotations do not take effect?
CLB configuration is unexpectedly modified Why is the configuration of my CLB instance modified?
Reusing an existing CLB instance does not take effect Service FAQ
No listener configured when reusing an existing CLB instance Why is no listener configured when I reuse an existing CLB instance?
Inconsistent CLB backends What do I do if the SLB vServer group is not updated?
CLB deletion issues CLB instance is unexpectedly deleted When is an SLB instance automatically deleted?
CLB instance is not deleted after the service is deleted When is an SLB instance automatically deleted?

Uneven load distribution across CLB backends

Cause: CLB scheduling algorithm is not suited to the traffic pattern.

Symptom: Uneven request distribution across backend servers.

Solution:

  • For services with externalTrafficPolicy: Local, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wrr" annotation to use weighted round-robin scheduling.

  • For services using persistent connections, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wlc" annotation for weighted least connections scheduling. Prevents one long-lived connection from monopolizing traffic.

To capture container network packets for load distribution analysis, see this Alibaba Cloud Developer community article.

503 error during application updates

Cause: Connection draining or pod graceful termination is not configured. During rolling updates, CLB may route traffic to terminating pods.

Symptom: 503 error when accessing the CLB during an application update.

Solution:

  1. Add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain annotation to enable connection draining. See Common operations to manage listeners.

  2. Configure readinessProbe and preStop on the pod:

    • readinessProbe : Pods join CLB backends only after passing the probe. Set probe frequency, delay, and failure threshold to match your application's startup time. Too-short timeouts cause repeated pod restarts.

    • preStop and terminationGracePeriodSeconds : Set preStop to the time your application needs to drain in-flight requests. Set terminationGracePeriodSeconds to at least 30 seconds longer than preStop.

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      namespace: default
    spec:
      containers:
      - name: nginx
        image: nginx
        # Liveness probe
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 5084
          timeoutSeconds: 1
        # Readiness probe
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 5084
          timeoutSeconds: 1
        # Graceful termination
        lifecycle:
          preStop:
            exec:
              command:
              - sleep
              - 30
      terminationGracePeriodSeconds: 60

CLB inaccessible from within the cluster

Cause: externalTrafficPolicy: Local is set on the service. kube-proxy only forwards traffic to pods on the same node as the request origin. If the node has no backend pod for the service, the connection fails. This affects in-cluster traffic routed to the CLB address. See kube-proxy adds external-lb address to node-local iptables rule.

Symptom: CLB is accessible from outside the cluster but connections fail from within.

Solution: Use one of the following approaches:

  • Access via ClusterIP or service name (recommended for in-cluster access): Use the service's ClusterIP or DNS name instead of the CLB address. For Ingress, the service name is nginx-ingress-lb.kube-system.

  • Switch to `externalTrafficPolicy: Cluster`: In-cluster traffic reaches the service regardless of pod placement, but client source IP is not preserved. To modify the Ingress service:

    With an Ingress CLB, pods can only access Ingress/CLB-exposed services from the node running the Ingress pod.
    kubectl edit svc nginx-ingress-lb -n kube-system
  • Use `externalTrafficPolicy: Cluster` with ENI pass-through (Terway only): If your cluster uses Terway with ENIs or multiple IPs per ENI, set externalTrafficPolicy: Cluster and add the service.beta.kubernetes.io/backend-type: "eni" annotation. This preserves source IP and enables in-cluster access. See Use annotations to configure a Classic Load Balancer (CLB) instance.

    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        service.beta.kubernetes.io/backend-type: eni
      labels:
        app: nginx-ingress-lb
      name: nginx-ingress-lb
      namespace: kube-system
    spec:
      externalTrafficPolicy: Cluster

CLB inaccessible from outside the cluster

Cause: An ACL blocks the client IP, the CLB vServer group has no backends, or the health check is failing.

Symptom: The CLB instance cannot be reached from outside the cluster.

Solution:

  1. Check for service error events and resolve them. See Service error events and solutions.

    kubectl -n {your-namespace} describe svc {your-svc-name}
  2. Check whether an ACL is configured on the CLB instance. If so, verify it allows inbound traffic from the client IP. See Resource Access Management.

  3. Check whether the CLB vServer group is empty. If empty, verify a pod is associated with the service and running. If unhealthy, resolve the pod issue first. See Troubleshoot pod issues.

  4. Check whether the CLB listener health check passes. If failing, verify the pod responds correctly. See CLB health check FAQ.

Cannot connect to the backend HTTPS service

Cause: With a certificate on the CLB listener, CLB terminates TLS and forwards HTTP to backends. If targetPort points to an HTTPS port (e.g., 443), the pod rejects the plaintext request with "The plain HTTP request was sent to HTTPS port."

Symptom: Backend connections fail after configuring HTTPS on the CLB listener.

Solution: Set targetPort to the pod's HTTP port. For example, if Nginx serves HTTPS on 443, set targetPort to 80.

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: "https:443"
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: "${YOUR_CERT_ID}"
  name: nginx
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer

Next steps