Troubleshoot Service issues

更新时间:
复制 MD 格式

Learn how to diagnose and resolve LoadBalancer Service issues in ACS clusters.

Background information

When you create a Type=LoadBalancer Service, the ACS Cloud Controller Manager (CCM) automatically creates and configures SLB resources, including the instance, listeners, and backend server groups. SLB auto-update policies are detailed in Considerations for configuring a LoadBalancer Service.

Procedure

Ensure the CCM version is 1.9.3.276-g372aa98-aliyun or later. Update the CCM. CCM release notes: Cloud Controller Manager.

Service troubleshooting process

  1. Run the following command to find the Service associated with the SLB instance:

    kubectl get svc -A |grep -i LoadBalancer|grep ${XXX.XXX.XXX.XXX}  #XXX.XXX.XXX.XXX is the IP address of the SLB instance.
  2. Run the following command to check for Service error events:

    kubectl -n {your-namespace} describe svc {your-svc-name}
    Important

    If no error events appear, verify that the CCM version is 1.9.3.276-g372aa98-aliyun or later. Update the CCM.

  3. If the issue persists, contact the ACS DingTalk support group.

Service errors and solutions

The following table lists common Service errors and solutions.

Error message

Description and solution

The loadbalancer does not support backend servers of eni type

Shared-resource SLB instances do not support ENI-type backend servers.

Solution: To use ENI backend servers, create a high-performance SLB instance by adding the annotation: service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: "slb.s1.small" annotation to the Service.

Important

Ensure the annotations match your CCM version. Supported annotations per version are listed in Add annotations to the YAML file of a Service to configure CLB instances.

There are no available nodes for LoadBalancer

No backend server is associated with the SLB instance. Check whether pods are associated with the Service and running normally.

Solutions:

  • If no pods are associated with the Service, associate your application pods.

  • If the associated pods are not running normally, troubleshoot them. Pod troubleshooting.

  • If the pods are running normally but no backend server is associated, check whether pods are on master nodes. If so, migrate them to worker nodes. Otherwise, contact the ACS DingTalk support group.

  • alicloud: not able to find loadbalancer named [%s] in openapi, but it's defined in service.loaderbalancer.ingress. this may happen when you removed loadbalancerid annotation

  • alicloud: can not find loadbalancer, but it's defined in service

The system cannot find the SLB instance associated with the Service.

Solution: Log on to the SLB console and search for the SLB instance in the region of the Service based on EXTERNAL-IP.

  1. If the SLB instance does not exist and the Service is no longer used, delete the Service.

  2. If the SLB instance exists, perform the following steps:

    1. If the SLB instance is created in the SLB console, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation to the Service. Supported annotations are listed in Add annotations to the YAML file of a Service to configure CLB instances.

    2. If the SLB instance was created automatically by the CCM, check if the instance has the kubernetes.do.not.delete label. If not, add this label. How do I rename an SLB instance when using an earlier version of CCM?.

ORDER.ARREARAGE Message: The account is arrearage.

Your account has overdue payments.

PAY.INSUFFICIENT_BALANCE Message: Your account does not have enough balance.

The account balance is less than 100 CNY. Top up your account.

Status Code: 400 Code: Throttlingxxx

API throttling is triggered for SLB.

Solutions:

  1. Go to the Quota Management page in the SLB console and check whether the SLB resource quotas are sufficient.

  2. Run the following command to check for Service errors. If errors occur, use the table above to troubleshoot.

    kubectl -n {your-namespace} describe svc {your-svc-name}

Status Code: 400 Code: RspoolVipExist Message: there are vips associating with this vServer group.

The listener associated with the vServer group cannot be deleted.

Solutions:

  1. Check whether the annotation of the Service contains the ID of the SLB instance. Example: service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: {your-slb-id}.

    If the annotation of the Service contains the ID of the SLB instance, the SLB instance is reused.

  2. Log on to the SLB console and delete the listener using the Service port. To delete an SLB listener, follow Manage forwarding rules for a listener.

Status Code: 400 Code: NetworkConflict

The reused internal-facing SLB instance and the cluster are not in the same VPC.

Solution: Make sure that your SLB instance and the cluster are deployed in the same VPC.

Status Code: 400 Code: VSwitchAvailableIpNotExist Message: The specified VSwitch has no available ip.

The vSwitch has no available IP addresses.

Solution: Use service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id: "${YOUR_VSWITCH_ID}" to specify another vSwitch in the same VPC.

Message:The specified VSwitch does not exist.

The specified vSwitch does not exist.

Solution:

  • If the Service has the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id annotation, verify that the corresponding vSwitch exists.

  • If the Service does not have the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id annotation, verify that the cluster's default vSwitch exists.

    If it does not exist, use this annotation to specify another vSwitch.

The specified Port must be between 1 and 65535.

The targetPort field does not support STRING type values in ENI mode.

Solution: Set the targetPort field in the Service YAML file to an INTEGER value, or upgrade the CCM. Update the CCM.

Status Code: 400 Code: ShareSlbHaltSales Message: The share instance has been discontinued.

By default, earlier CCM versions create shared-resource SLB instances, which are no longer available for purchase.

Solution: Update the CCM.

can not change ResourceGroupId once created

You cannot modify the resource group of an SLB instance after it is created.

Solution: Delete the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-resource-group-id:"rg-xxxx" annotation from the Service.

can not find eniid for ip x.x.x.x in vpc vpc-xxxx

The specified IP address of the ENI cannot be found in the VPC.

Solution: Check whether the service.beta.kubernetes.io/backend-type: eni annotation is added to the Service. If the annotation is added to the Service, check whether Flannel is used as the network plug-in of the cluster. If Flannel is used, delete the annotation from the Service. Flannel does not support the ENI mode.

  • The operation is not allowed because the instanceChargeType of loadbalancer is PayByCLCU.

  • User does not have permission modify InstanceChargeType to spec.

You cannot change the billing method of the SLB instance used by a Service from pay-as-you-go to pay-by-specification.

Solutions:

  • Delete the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec annotation from the Service.

  • If service.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type is added to the Service, set the value to PayByCLCU.

SyncLoadBalancerFailed the loadbalancer xxx can not be reused, can not reuse loadbalancer created by kubernetes.

The SLB instance created by the CCM is reused.

Solutions:

  1. Check the YAML file of the related Service and record the SLB instance ID in the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation.

  2. Troubleshoot the issue based on the status of the Service.

    • If the Service is in the Pending state, change the value of the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation to the ID of an SLB instance that is manually created in the CLB console.

    • If the Service is not in the Pending state, perform the following operations:

      • If the IP address of the SLB instance is the same as the external IP addresses of the Service, delete the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation.

      • If the IP address of the SLB instance is different from the external IP addresses of the Service, log on to the CLB console, select the region in which the cluster resides, find the SLB instances based on the external IP of the Service, and then change the value of the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation to the ID of a manually created SLB instance. If no corresponding SLB instance is found, change the value of the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation to the ID of an SLB instance that is manually created in the SLB console. Then, recreate the Service.

alicloud: can not change LoadBalancer AddressType once created. delete and retry

You cannot change the type of an SLB instance after it is created.

Solution: Recreate the related Service.

the loadbalancer lb-xxxxx can not be reused, service has been associated with ip [xxx.xxx.xxx.xxx], cannot be bound to ip [xxx.xxx.xxx.xxx]

You cannot associate an SLB instance with a Service that is already associated with another SLB instance.

Solution: You cannot reuse an existing SLB instance by modifying the value of the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation. To change the SLB instance that is associated with a Service, you must delete and recreate the Service.

Troubleshooting

The following table lists common troubleshooting scenarios and solutions.

Category

Issue

Solution

Issues that occur when you access an SLB instance

The SLB instance does not evenly distribute traffic.

The SLB instance does not evenly distribute traffic

The 503 error occurs when I access the SLB instance during application updates.

The 503 error occurs when I access the SLB instance during application updates

The SLB instance cannot be accessed from within the cluster.

The IP address of the SLB instance that is associated with the LoadBalancer Service cannot be accessed from within the cluster

The SLB instance cannot be accessed from outside the cluster.

The SLB instance cannot be accessed from outside the cluster

The The plain HTTP request was sent to HTTPS port error occurs when a request is sent to an HTTPS port

Errors occur when a request is sent to an HTTPS port

Issues related to SLB configurations

The annotations of the Service do not take effect.

What do I do if the annotations of a Service do not take effect?

The configuration of the SLB instance is modified.

Why is the configuration of an SLB instance modified?

The system fails to reuse an existing SLB instance.

Why does the system fail to use an existing SLB instance for more than one Services?

No listener is created when an existing SLB instance is reused.

Why is no listener created when I reuse an existing SLB instance?

The endpoint of the Service is different from that specified for the backend server of the SLB instance.

What do I do if the vServer groups of an SLB instance are not updated?

Issues related to SLB deletion

The SLB instance is deleted.

When is an SLB instance automatically deleted?

The SLB instance is not deleted together with the Service.

When is an SLB instance automatically deleted?

The SLB instance does not evenly distribute traffic

Cause

The scheduling algorithm of the SLB instance is improper.

Symptom

Traffic is not evenly distributed to the backend servers of the SLB instance.

Solution

  • If long-lived connections are established to your Service, set the scheduling algorithm of the SLB instance to Weighted Least Connections (WLC) by adding the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wlc" annotation.

The 503 error occurs when I access the SLB instance during application updates

Cause

Connection draining is not configured for the SLB listener or graceful shutdown is not configured for the pod.

Symptom

The 503 error occurs when you access the SLB instance during application updates.

Solution

  1. Add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain annotation to configure connection draining for the SLB listener. Annotation details are in Common operations to manage listeners.

  2. Set the preStop and readinessProbe parameters for the pod based on the network mode of the pod.

    • readinessProbe checks whether the container is ready to accept traffic. The pod is added to the endpoint only after passing the readiness probe, and then attached to the SLB instance. Set a proper probing interval, delay period, and unhealthy threshold for readinessProbe — applications with long startup times may cause repeated restarts if thresholds are too short.

    • Set preStop to the time the pod needs to handle remaining requests. Set terminationGracePeriodSeconds to at least 30 seconds longer than preStop.

    Pod configuration example:

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      namespace: default
    spec:
      containers:
      - name: nginx
        image: nginx
        # Liveness probe
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 1
        # Readiness probe
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 1
        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command:
              - sleep
              - "30"
      terminationGracePeriodSeconds: 60

The SLB instance cannot be accessed from outside the cluster

Cause

You configured ACL rules for the SLB instance, or the SLB instance is not running properly.

Symptom

You cannot access the SLB instance from outside the cluster.

Solution

  1. Run the following command to query Service events and troubleshoot errors. Service errors and solutions.

    kubectl -n {your-namespace} describe svc {your-svc-name}
  2. Check whether ACL rules are configured for the SLB instance.

    If ACL rules are configured for the SLB instance, check whether the client IP address is allowed. ACL configuration details are in Access control.

  3. Check whether the SLB instance is associated with a vServer group.

    If no vServer group is associated, check whether application pods are associated with the Service and running normally. If the pods are not running normally, troubleshoot them. Pod troubleshooting.

  4. Check whether unhealthy backend servers are detected by the SLB listeners.

    If unhealthy backend servers are detected, check whether the application pods are running normally. For SLB health check details, see Execute a health check script.

  5. If the issues persist, contact the ACS DingTalk support group.

Backend HTTPS services cannot be accessed

Cause

After you specify the certificate in the SLB instance, the SLB instance decrypts HTTPS requests and forwards HTTP requests to the backend pods.

Symptom

You cannot access backend HTTPS services.

Solution

Set targetPort to an HTTP port in the Service. For example, the HTTPS port is 443 in the following NGINX Service. In this case, you must change the value of targetPort to 80.

Examples:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: "https:443"
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: "${YOUR_CERT_ID}"
  name: nginx
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer