Container networking FAQ

更新时间:
复制 MD 格式

This topic addresses frequently asked questions (FAQs) and common issues you may encounter when using the Terway or Flannel network plugin, including how to choose a network plugin, whether your cluster supports third-party network plugins, and how to plan your cluster network.

Index

Terway

Flannel

kube-proxy

IPv6

How to troubleshoot common issues with IPv6 dual-stack?

ACK container network data links

Other

Terway network modes

Terway supports shared ENI mode and exclusive ENI mode. For details about each mode, see Shared ENI mode and exclusive ENI mode. Only Terway shared ENI mode supports network acceleration (DataPathv2 or IPvlan+eBPF). DataPathv2 is an upgrade to the earlier IPvlan+eBPF acceleration mode. With Terway v1.8.0 or later, you can only select DataPathv2 for acceleration when creating a cluster and installing the Terway plugin.

Differentiate between exclusive and shared ENI modes

  • In Terway v1.11.0 and later, Terway uses shared ENI mode by default. You can enable exclusive ENI mode by configuring the exclusive ENI network mode for a node pool.

  • For Terway versions earlier than v1.11.0, you can select either exclusive ENI mode or shared ENI mode during cluster creation. After you create a cluster, you can identify the mode as follows:

    • Exclusive ENI mode: The Terway DaemonSet in the kube-system namespace is named terway-eni.

    • Shared ENI mode: The Terway DaemonSet in the kube-system namespace is named terway-eniip.

Determine the network acceleration mode

Only Terway shared ENI mode supports network acceleration (DataPathv2 or IPvlan+eBPF). DataPathv2 is an upgraded version of the legacy IPvlan+eBPF acceleration mode. In Terway v1.8.0 and later, you can select only DataPathv2 for acceleration when you create a cluster and install the Terway plugin.

To determine which network acceleration mode is active, check the eniip_virtual_type setting in the eni-config ConfigMap in the kube-system namespace. The value is either datapathv2 (indicating DataPathv2) or ipvlan (indicating IPvlan+eBPF).

IPVS bypass in acceleration mode

In acceleration mode (DataPathv2 or IPvlan+eBPF), Terway uses a different traffic forwarding path from standard shared ENI mode. For example, when a Pod accesses an internal Service, traffic bypasses the node's network stack and IPVS routing. Terway uses eBPF to resolve a Service address to a backend Pod's address. For details on traffic flow, see Network acceleration.

Switching network plugins

No. The network plugin, either Terway or Flannel, can only be selected during cluster creation and cannot be changed afterward. To switch plugins, you must create a new cluster. For more information, see Create an ACK managed cluster.

Cluster cannot access the internet after adding a vSwitch in Terway

Symptom

You manually add a vSwitch to a Terway network to resolve an IP address shortage for Pods. After adding the vSwitch, the cluster cannot access the internet.

Cause

The vSwitch that provides IP addresses to the Pods is not configured for internet access.

Solution

Use a NAT Gateway to configure an SNAT rule for the vSwitch. For more information, see Enable internet access for a cluster.

Resolve Flannel incompatibility on Kubernetes 1.16+

Symptom

After upgrading a cluster to Kubernetes 1.16 or later, the nodes become NotReady.

Cause

Because the Flannel version was manually upgraded but its configuration was not, Kubelet is unable to recognize the configuration.

Solution

  1. Edit the Flannel ConfigMap to add the cniVersion field.

    kubectl edit cm kube-flannel-cfg -n kube-system 

    The cniVersion field is added to the response.

    "name": "cb0",   
    "cniVersion":"0.3.0",
    "type": "flannel",
  2. Restart the Flannel pods to apply the new configuration.

    kubectl delete pod -n kube-system -l app=flannel

Pod startup latency

Symptom

A Pod experiences network latency for a short period after it starts.

Cause

Enforcing a network policy can introduce latency.

Solution

  1. Edit the Terway ConfigMap to disable network policy.

    kubectl edit cm -n kube-system eni-config

    Add the following field to the configuration:

    disable_network_policy: "true"
  2. Optional: If you are using an earlier version of Terway, upgrade the Terway add-on on the console.

    1. Log on to the ACK console. In the left navigation pane, click Clusters.

    2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Add-ons.

    3. On the Add-ons page, click the Networking tab, and then click Upgrade for the Terway add-on.

    4. In the dialog box, complete the configuration and click OK.

  3. Restart all Terway Pods.

    kubectl delete pod -n kube-system -l app=terway-eniip

Enable hairpinning for Pods

Symptom

A Pod fails to access a service it exposes. You might experience intermittent connection failures, or the connection fails when traffic is routed back to the same Pod.

Cause

Hairpinning may not be enabled in Flannel clusters.

Note
  • Flannel versions earlier than v0.15.1.4-e02c8f12-aliyun do not allow hairpinning. After you upgrade to a later version, hairpinning remains disabled by default but you can enable it manually.

  • Hairpinning is enabled by default only in new deployments that use Flannel v0.15.1.4-e02c8f12-aliyun or later.

Solution

  • Use a Headless Service to expose and access the service. For more information, see Headless Services.

    Note

    This is the recommended method.

  • Recreate the cluster and use the Terway network plugin. For more information, see Use the Terway network plugin.

  • Modify the Flannel configuration, and then recreate the Flannel Pods and your application Pods.

    Note

    This method is not recommended because future upgrades might overwrite the configuration.

    1. Edit cni-config.json.

      kubectl edit cm kube-flannel-cfg -n kube-system
    2. In the returned configuration, add hairpinMode: true to the delegate section.

      Example:

      cni-conf.json: |
          {
            "name": "cb0",
            "cniVersion":"0.3.1",
            "type": "flannel",
            "delegate": {
              "isDefaultGateway": true,
              "hairpinMode": true
            }
          }
    3. Restart the Flannel Pods.

      kubectl delete pod -n kube-system -l app=flannel   
    4. Delete and recreate your application Pods.

Choose between Terway and Flannel network plugins

This topic helps you choose between Terway and Flannel, the two network plugins available for ACK clusters.

The available plugins are:

  • Flannel: A simple, stable, and open-source CNI plugin. On the high-speed Alibaba Cloud VPC network, Flannel provides high-performance and stable container networking. However, it has a limited feature set and does not support standard Kubernetes NetworkPolicy.

  • Terway: An Alibaba Cloud-developed network plugin that is fully compatible with Flannel. Terway lets you assign Alibaba Cloud elastic network interfaces to Pods, define access policies between Pods by using standard Kubernetes NetworkPolicy, and apply bandwidth limiting to individual Pods. Choose Flannel if you do not require NetworkPolicy. For all other use cases, we recommend Terway. For more information about Terway, see Use the Terway network plugin.

Cluster network planning

To create an ACK cluster, you must specify a VPC, vSwitches, a pod CIDR block, and a service CIDR block. We recommend planning the address spaces for your ECS instances, pods, and services. For more information, see Plan the network for an ACK managed cluster.

Support for hostPort mapping

  • Only the Flannel plugin supports hostPort; other plugins currently do not.

  • Pod IP addresses in ACK are directly accessible by other resources within the same VPC, so no additional port mapping is required.

  • To expose a service externally, use a NodePort or LoadBalancer service.

View cluster network type and vSwitches

ACK supports two container network types: Flannel and Terway.

  • Follow these steps to view the cluster's network type.

    1. Log on to the ACK console. In the left navigation pane, click Clusters.

    2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Cluster Information.

    3. Click the Basic Information tab. In the Network section, the Network Plug-in field shows the container network type.

      • If Terway is displayed to the right of Network Plug-in, the container network type is Terway network.

      • If Flannel is displayed to the right of Network Plug-in, the container network type is Flannel network.

  • Follow these steps to view the vSwitches for the cluster nodes.

    1. In the left-side navigation pane, choose Nodes > Node Pools.

    2. On the Node Pools page, click Details in the Actions column of the target node pool, and then click the Basic Information tab.

      In the Node configuration section, find the Node vSwitch ID.

  • Follow these steps to find the Pod vSwitch ID for Terway networks.

    Note

    Only Terway networks require a Pod vSwitch.

    1. In the left-side navigation pane, click Add-ons.

    2. On the Add-ons page, click Configuration on the terway-eniip card. The PodVswitchId field shows the current Pod vSwitch ID.

View cloud resources

Follow these steps to view the cloud resources used by your cluster, including virtual machines, VPCs, and the Worker RAM Role.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the cluster list page, click the name of your target cluster or click Details in the Actions column.

  3. Click the Basic Information tab to view the cloud resources used by the cluster.

Modify the kube-proxy configuration

By default, an ACK managed cluster uses the kube-proxy-worker DaemonSet as its load balancer. You can control its parameters with the kube-proxy-worker ConfigMap. If you are using an ACK dedicated cluster, an additional kube-proxy-master DaemonSet and its corresponding ConfigMap run on the control plane nodes.

The kube-proxy ConfigMaps are compatible with the community KubeProxyConfiguration standard. You can customize the configuration according to this standard. For more information, see kube-proxy Configuration. The kube-proxy configuration file requires strict formatting. Do not omit any colons or spaces. To modify the kube-proxy configuration:

  • If you are using an ACK managed cluster, modify the kube-proxy-worker configuration.

    1. Log on to the ACK console. In the left navigation pane, click Clusters.

    2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Configurations > ConfigMaps.

    3. Select the kube-system namespace from the drop-down list at the top of the page. Then, click Edit YAML to the right of the kube-proxy-worker ConfigMap.

    4. In the View in YAML panel, modify the parameters and then click OK.

    5. Recreate all kube-proxy-worker pods to apply the new configuration.

      Important

      Restarting kube-proxy does not interrupt existing services. However, a brief delay may occur while programming new rules for concurrently deployed services. We recommend that you perform this operation during off-peak hours.

      1. In the left-side navigation pane, choose Workloads > DaemonSets.

      2. In the DaemonSet list, find and click kube-proxy-worker.

      3. On the kube-proxy-worker page, on the Pods tab, choose More > Delete and click OK.

        Repeat this step to delete all pods. After you delete the pods, the system automatically recreates them.

  • For an ACK dedicated cluster, modify both the kube-proxy-worker and kube-proxy-master configurations, then delete their pods. The system automatically recreates the pods with the new configuration. The procedure is the same as described above.

Increase the Linux conntrack limit

If your kernel log (dmesg) contains conntrack full error messages, it means the conntrack table has reached its conntrack_max limit. Increase the Linux connection tracking limit.

  1. Run the following commands to check the protocol usage and the number of entries in the conntrack table.

    # View the table details. You can pipe this to grep to filter by state, or use cat /proc/net/nf_conntrack.
    conntrack -L
    
    # View the current count.
    cat /proc/sys/net/netfilter/nf_conntrack_count
    
    # View the current maximum table size.
    cat /proc/sys/net/netfilter/nf_conntrack_max
    • If you observe a large number of TCP connections, identify the specific services responsible. If these applications use short-lived connections, consider redesigning them for long-lived connections.

    • If you observe high DNS traffic, enable NodeLocal DNSCache in your ACK cluster to improve DNS performance. For instructions, see Use the NodeLocal DNSCache component.

    • If you see frequent application-layer errors such as timeout or 504, or if the kernel log contains kernel: nf_conntrack: table full, dropping packet. errors, cautiously adjust the conntrack parameters.

      Example conntrack parameter adjustments in /etc/sysctl.conf

      # Modify the current maximum table size. 
      net.netfilter.nf_conntrack_max = 655350
      
      # Set the timeout for ESTABLISHED connections. 21600s (6 hours) is a common value. Adjust cautiously based on your needs.
      net.netfilter.nf_conntrack_tcp_timeout_established = 21600
      
      # Reduce timeouts for TCP teardown states to clean up entries faster. Adjust cautiously based on your needs.
      net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
      net.netfilter.nf_conntrack_tcp_timeout_close_wait = 120
      net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30
  2. If the current conntrack usage is reasonable or you do not want to modify your applications, you can increase the connection tracking limit by adding the maxPerCore parameter to the kube-proxy configuration.

    • For a managed cluster, add the maxPerCore parameter to the kube-proxy-worker ConfigMap and set its value to a number higher than 65536. Then, delete the kube-proxy-worker pod. The pod is automatically recreated with the new configuration. For details, see How do I modify the kube-proxy configuration?.

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          featureGates:
            IPv6DualStack: true
          clusterCIDR: 172.20.0.0/16
          clientConnection:
            kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
          conntrack:
            maxPerCore: 65536 # Set maxPerCore to a reasonable value. 65536 is the default.
          mode: ipvs
      # Other fields omitted
    • For a dedicated cluster, add the maxPerCore parameter to both the kube-proxy-worker and kube-proxy-master ConfigMaps and set its value to a number higher than 65536. Then, delete the kube-proxy-worker and kube-proxy-master pods. The pods are automatically recreated with the new configuration. For details, see How do I modify the kube-proxy configuration?.

Note

In Terway DataPath V2 or IPvlan mode, conntrack entries for container traffic are stored in an eBPF map instead of the standard Linux conntrack table used by other modes. To learn how to adjust the eBPF conntrack size, see Optimize conntrack configurations in Terway mode.

Modify IPVS load balancing mode in kube-proxy

If your workload uses long-lived connections, you may encounter unbalanced requests to backend pods because each connection handles multiple requests. To resolve this, modify the IPVS load balancing mode in kube-proxy by following these steps:

  1. Choose a suitable scheduling algorithm. For guidance, see the official Kubernetes documentation on parameter-changes.

  2. Nodes in clusters created before October 2022 may not have the required IPVS kernel modules loaded by default. You must manually load the required kernel module on each cluster node. This example uses the least-connection (lc) scheduling algorithm. If you choose a different algorithm, replace lc with its identifier. Log on to each node and run lsmod | grep ip_vs_lc to check if the module is loaded.

    • If the command returns output containing ip_vs_lc, the kernel module is already loaded. You can skip this step.

    • If no output is returned, run modprobe ip_vs_lc to load the module immediately. Then, run echo "ip_vs_lc" >> /etc/modules-load.d/ack-ipvs-modules.conf to ensure the module loads automatically on reboot.

  3. Set the ipvs.scheduler parameter in the kube-proxy configuration to a suitable scheduling algorithm.

    • If you are using a managed cluster, modify the ipvs.scheduler parameter in the kube-proxy-worker ConfigMap. Then, delete the kube-proxy-worker pods. The pods are automatically recreated, applying the new configuration. For more information, see How to modify the kube-proxy configuration.

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          featureGates:
            IPv6DualStack: true
          clusterCIDR: 172.20.0.0/16
          clientConnection:
            kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
          conntrack:
            maxPerCore: 65536
          mode: ipvs
          ipvs:
            scheduler: lc # Set the scheduler to a suitable scheduling algorithm.
      # Other fields omitted.
    • If you are using a dedicated cluster, modify the ipvs.scheduler parameter in both the kube-proxy-worker and kube-proxy-master ConfigMaps. Then, delete the kube-proxy-worker and kube-proxy-master pods. The pods are automatically recreated, applying the new configuration. For more information, see How to modify the kube-proxy configuration.

  4. Check the kube-proxy runtime logs.

    • Run kubectl get pods to check that the new kube-proxy-worker pod in the kube-system namespace is Running. For a dedicated cluster, you must also check the kube-proxy-master pod.

    • Run kubectl logs to view the new pod's logs.

      • If the log contains the message Can't use the IPVS proxier: IPVS proxier will not be used because the following required kernel modules are not loaded: [ip_vs_lc], the IPVS scheduling algorithm kernel module failed to load. Verify the preceding steps and try again.

      • If you see the message Using iptables Proxier., it indicates that kube-proxy failed to enable the IPVS module and is automatically falling back to iptables mode. In this case, you should first roll back the kube-proxy configuration and then restart the machine.

      • If the log shows Using ipvs Proxier. and the preceding errors are absent, the IPVS module was successfully enabled.

    • If all checks pass, the change was successful.

For information on capturing network packets in a container to check for load imbalance, see the Alibaba Cloud Developer Community.

Modify the IPVS UDP session timeout

If your ACK cluster uses kube-proxy in IPVS mode, the default session persistence policy can cause intermittent packet loss for up to five minutes after a UDP backend is removed. If your services depend on CoreDNS, you might experience service latency and request timeouts for five minutes when the CoreDNS component is upgraded or its node is restarted.

If your services in the ACK cluster do not use the UDP protocol, you can reduce the impact of resolution delays or failures by lowering the IPVS UDP session persistence timeout. Follow these steps:

Note

If your services use the UDP protocol, submit a ticket.

  • Clusters running Kubernetes v1.18 or later

    • If you are using a managed cluster, you need to modify the value of the udpTimeout parameter in kube-proxy-worker. Then, delete the kube-proxy-worker Pod. The configuration takes effect after the Pod is automatically recreated. For more information, see How do I modify the kube-proxy configuration?.

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: kube-proxy-worker
        namespace: kube-system
      data:
        config.conf: |
          apiVersion: kubeproxy.config.k8s.io/v1alpha1
          kind: KubeProxyConfiguration
          # Other fields are omitted for brevity.
          mode: ipvs
          # Add the ipvs key if it does not exist.
          ipvs:
            udpTimeout: 10s # The default is 300s. Changing this to 10s reduces the impact window for packet loss to 10 seconds after a UDP backend is removed.
    • If you are using a dedicated cluster, you need to modify the udpTimeout parameter value in kube-proxy-worker and kube-proxy-master. Then, delete the kube-proxy-worker and kube-proxy-master Pods. The configuration takes effect after the Pods are automatically recreated. For more information about how to modify and delete kube-proxy-worker, see How do I modify the kube-proxy configuration?.

  • Clusters running Kubernetes v1.16 or earlier

    The kube-proxy in clusters of this version does not support the udpTimeout parameter. You can use CloudOps Orchestration Service (OOS) to run the ipvsadm command in batches on all nodes in the cluster to adjust the UDP timeout configuration. The command is as follows:

    yum install -y ipvsadm
    ipvsadm -L --timeout > /tmp/ipvsadm_timeout_old
    ipvsadm --set 900 120 10
    ipvsadm -L --timeout > /tmp/ipvsadm_timeout_new
    diff /tmp/ipvsadm_timeout_old /tmp/ipvsadm_timeout_new

    For examples of batch operations in OOS, see Batch operation instances.

Troubleshooting IPv6 dual-stack issues

  • Problem: The Pod IP displayed in kubectl is still an IPv4 address.

    Solution: Run the following command to display the podIPs field. The output should include an IPv6 address.

    kubectl get pods -A -o jsonpath='{range .items[*]}{@.metadata.namespace} {@.metadata.name} {@.status.podIPs[*].ip} {"\n"}{end}'
  • Problem: The cluster IP displayed in kubectl is still an IPv4 address.

    Solution:

    1. Verify that the spec.ipFamilyPolicy of the Service is not set to SingleStack.

    2. Run the following command to display the clusterIPs field. The output should include an IPv6 address.

      kubectl get svc -A -o jsonpath='{range .items[*]}{@.metadata.namespace} {@.metadata.name} {@.spec.ipFamilyPolicy} {@.spec.clusterIPs[*]} {"\n"}{end}'
  • Problem: The Pod cannot be accessed using its IPv6 address.

    Cause: Some applications, such as Nginx containers, do not listen on IPv6 addresses by default.

    Solution: Run the netstat -anp command to verify that the Pod is listening on an IPv6 address.

    Expected output:

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 127.0.XX.XX:10248         0.0.0.0:*               LISTEN      8196/kubelet
    tcp        0      0 127.0.XX.XX:41935         0.0.0.0:*               LISTEN      8196/kubelet
    tcp        0      0 0.0.XX.XX:111             0.0.0.0:*               LISTEN      598/rpcbind
    tcp        0      0 0.0.XX.XX:22              0.0.0.0:*               LISTEN      3577/sshd
    tcp6       0      0 :::30500                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::10250                :::*                    LISTEN      8196/kubelet
    tcp6       0      0 :::31183                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::10255                :::*                    LISTEN      8196/kubelet
    tcp6       0      0 :::111                  :::*                    LISTEN      598/rpcbind
    tcp6       0      0 :::10256                :::*                    LISTEN      1916680/kube-proxy
    tcp6       0      0 :::31641                :::*                    LISTEN      1916680/kube-proxy
    udp        0      0 0.0.0.0:68              0.0.0.0:*                           4892/dhclient
    udp        0      0 0.0.0.0:111             0.0.0.0:*                           598/rpcbind
    udp        0      0 47.100.XX.XX:323           0.0.0.0:*                           6750/chronyd
    udp        0      0 0.0.0.0:720             0.0.0.0:*                           598/rpcbind
    udp6       0      0 :::111                  :::*                                598/rpcbind
    udp6       0      0 ::1:323                 :::*                                6750/chronyd
    udp6       0      0 fe80::216:XXXX:fe03:546 :::*                                6673/dhclient
    udp6       0      0 :::720                  :::*                                598/rpcbind

    If Proto is tcp, the Pod is listening on an IPv4 address. If it is tcp6, the Pod is listening on an IPv6 address.

  • Problem: A Pod can be accessed using its IPv6 address from within the cluster, but not from the internet.

    Cause: The IPv6 address may not have public bandwidth configured.

    Solution: Configure public bandwidth for the IPv6 address. For more information, see Enable and manage IPv6 public bandwidth.

  • Problem: The Pod cannot be accessed using the IPv6 cluster IP.

    Solution:

    1. Verify that the spec.ipFamilyPolicy of the Service is not set to SingleStack.

    2. Run the netstat -anp command to verify that the Pod is listening on an IPv6 address.

      Expected output:

      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 127.0.XX.XX:10248         0.0.0.0:*               LISTEN      8196/kubelet
      tcp        0      0 127.0.XX.XX:41935         0.0.0.0:*               LISTEN      8196/kubelet
      tcp        0      0 0.0.XX.XX:111             0.0.0.0:*               LISTEN      598/rpcbind
      tcp        0      0 0.0.XX.XX:22              0.0.0.0:*               LISTEN      3577/sshd
      tcp6       0      0 :::30500                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::10250                :::*                    LISTEN      8196/kubelet
      tcp6       0      0 :::31183                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::10255                :::*                    LISTEN      8196/kubelet
      tcp6       0      0 :::111                  :::*                    LISTEN      598/rpcbind
      tcp6       0      0 :::10256                :::*                    LISTEN      1916680/kube-proxy
      tcp6       0      0 :::31641                :::*                    LISTEN      1916680/kube-proxy
      udp        0      0 0.0.0.0:68              0.0.0.0:*                           4892/dhclient
      udp        0      0 0.0.0.0:111             0.0.0.0:*                           598/rpcbind
      udp        0      0 47.100.XX.XX:323           0.0.0.0:*                           6750/chronyd
      udp        0      0 0.0.0.0:720             0.0.0.0:*                           598/rpcbind
      udp6       0      0 :::111                  :::*                                598/rpcbind
      udp6       0      0 ::1:323                 :::*                                6750/chronyd
      udp6       0      0 fe80::216:XXXX:fe03:546 :::*                                6673/dhclient
      udp6       0      0 :::720                  :::*                                598/rpcbind

      If Proto is tcp, the Pod is listening on an IPv4 address. If it is tcp6, the Pod is listening on an IPv6 address.

    3. Problem: The Pod cannot access the internet over IPv6.

      Solution: To enable outbound IPv6 internet access, create an IPv6 Gateway and configure public bandwidth for the IPv6 address. For more information, see Create and manage an IPv6 Gateway and Enable and manage IPv6 public bandwidth.

Insufficient IP addresses in a Terway vSwitch

Problem description

You are unable to create pods. When you log on to the VPC console, select the target region, and view the information for the vSwitch used by the cluster, you will find that its number of available IPs is 0. To confirm the issue, see More information.

image

Cause

The vSwitch used by Terway on the node has no available IP addresses. This prevents pods from obtaining IP resources, causing them to remain in the ContainerCreating state.

Solution

Follow these steps to add a new vSwitch and expand your cluster's IP resources:

  1. Log on to the VPC console, select the target region, and create a new vSwitch.

    Note

    The new vSwitch must be in the same region and availability zone as the vSwitch with insufficient IP resources. To support high pod density, we recommend using a CIDR block of /19 or smaller for the pod vSwitch. This provides at least 8,192 IP addresses.

  2. Log on to the Container Service for Kubernetes console, and in the left navigation bar, select Clusters.On the Clusters page, click the name of the target cluster, and then in the left navigation bar, click Add-ons.

    Find the terway-eniip card and click Configuration. In the PodVswitchId field, add the ID of the new vSwitch.

  3. Run the following command to delete all Terway pods. The pods are recreated automatically.

    Note

    If you selected "Assign exclusive elastic network interfaces to pods for optimal performance" when you created the cluster, your cluster is in single IP per ENI mode. Otherwise, it is in multiple IPs per ENI mode. For more information, see Terway.

    • For multiple IPs per ENI mode: kubectl delete -n kube-system pod -l app=terway-eniip

    • For single IP per ENI mode: kubectl delete -n kube-system pod -l app=terway-eni

  4. Run the kubectl get pod command to confirm that all Terway pods have been recreated successfully.

  5. Create a new pod and verify that it starts successfully and obtains an IP address from the new vSwitch.

More information

Connect to your Kubernetes cluster. For more information, see Connect to a Kubernetes cluster by using kubectl. Run the kubectl get pod command and find that the pod status is ContainerCreating. Run the following commands to view the logs of the Terway container on the pod's node.

# Replace [$Node_Name] with the name of the node where your pod is located to find the name of the Terway pod on that node.
kubectl get pod -l app=terway-eniip -n kube-system | grep [$Node_Name] 
# Replace [$Pod_Name] with the name of the Terway pod.
kubectl logs --tail=100 -f [$Pod_Name] -n kube-system -c terway

The output is similar to the following. An error message containing InvalidVSwitchId.IpNotEnough confirms that the vSwitch has insufficient IP addresses.

time="2020-03-17T07:03:40Z" level=warning msg="Assign private ip address failed: Aliyun API Error: RequestId: 2095E971-E473-4BA0-853F-0C41CF52651D Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-AAA\" has not enough IpAddress., retrying"

Pod IP outside vSwitch CIDR in Terway mode

Symptom

In a Terway network, Pods are assigned IP addresses outside the configured vSwitch CIDR block.

Cause

Terway allocates Pod IP addresses from the VPC and assigns them to containers by using elastic network interfaces (ENIs). You can specify a vSwitch only when creating a new ENI. If an ENI already exists, Pods continue to receive IP addresses from the vSwitch associated with that ENI.

This issue typically occurs in the following two scenarios:

  • You add a node previously used in a different cluster that was not drained before removal.

  • You manually update the vSwitch configuration used by Terway. Because nodes may still have ENIs associated with the old configuration, new Pods may be assigned IP addresses from these ENIs.

Solution

To apply the new configuration, create new nodes or rotate existing ones.

To rotate an existing node, follow these steps:

  1. Drain and remove the node. For more information, see Remove a node.

  2. Detach the ENIs from the removed node. For more information, see Manage ENIs.

  3. After detaching the ENIs, re-add the node to the original ACK cluster. For more information, see Add existing nodes.

Pod IP assignment failure in Terway network mode

Symptoms

In the Terway network mode, pods cannot obtain IP addresses even after you add a new vSwitch.

Cause

Pod IPs come from VPC addresses and are assigned to containers through elastic network interfaces (ENIs). Terway applies the new vSwitch configuration only when it creates a new ENI. Because the node has reached its ENI quota, Terway cannot create a new ENI. As a result, the new vSwitch configuration does not take effect. For more information about the ENI quota, see Elastic network interfaces overview.

Solution

Create new nodes or rotate existing nodes to apply the new configuration.

To rotate an old node, follow these steps:

  1. Drain and remove the old node. For more information, see Remove a node.

  2. Detach the ENIs from the removed node. For more information, see Manage ENIs.

  3. After the ENIs are detached, add the node back to the original ACK cluster. For more information, see Add an existing node.

Enable in-cluster load balancing in Terway IPvlan

Symptom

For newly created clusters using Terway v1.2.0 or later in IPvlan mode, in-cluster load balancing is enabled by default. When a pod accesses an ExternalIP or a LoadBalancer Service from within the cluster, traffic routes to the Service network. This topic describes how to enable this feature for existing Terway IPvlan clusters.

Cause

Typically, kube-proxy short-circuits traffic from pods to ExternalIP and LoadBalancer Services. Instead of routing traffic externally, kube-proxy redirects requests directly to a backend Endpoint. However, in Terway IPvlan mode, this traffic is handled by Cilium, not kube-proxy. Versions of Terway before v1.2.0 did not support this short-circuiting mechanism. Although the feature is enabled by default in new clusters that use Terway v1.2.0 or later, you must manually enable it for existing clusters.

Solution

Note
  • This feature requires Terway v1.2.0 or later and IPvlan mode.

  • This configuration has no effect and is not required for clusters that do not use IPvlan mode.

  • This feature is enabled by default on new clusters and requires no configuration.

  1. Run the following command to edit the Terway ConfigMap.

    kubectl edit cm eni-config -n kube-system
  2. Add the following line to the eni_conf section.

    in_cluster_loadbalance: "true"
    Note

    Ensure that in_cluster_loadbalance is at the same indentation level as other fields in eni_conf.

  3. Run the following command to recreate the Terway pods and apply the configuration.

    kubectl delete pod -n kube-system -l app=terway-eniip

    Verify the configuration

    Run the following command to check the logs of the policy container in a terway-eniip pod. The configuration is successful if the output contains enable-in-cluster-loadbalance=true.

    kubectl logs -n kube-system <terway-pod-name> policy | grep enable-in-cluster-loadbalance

Assign pods to a CIDR block for allowlisting

Symptom

To secure access to services such as databases, you must add client IP addresses to an allowlist. In a container network, this process is challenging because pod IP addresses are dynamic.

Cause

ACK provides two container network plugins: Flannel and Terway.

  • In a Flannel network, pods use the IP address of their host node for outbound traffic. You can configure an allowlist by scheduling client pods to a specific set of nodes and then adding the IP addresses of those nodes to your database's allowlist.

  • In a Terway network, each pod gets its own IP address from an elastic network interface (ENI). When a pod communicates with an external service, its source IP address is its own, not the node's IP address. Even if you use node affinity to pin a pod to a specific node, its outbound IP is still the one assigned to its ENI. By default, these pod IP addresses are randomly allocated from the vSwitches configured for Terway. This makes it difficult to manage allowlists, especially in autoscaling scenarios. To solve this, assign a dedicated CIDR block to these client pods and then add the entire CIDR block to the database's allowlist.

Solution

You can use node labels to specify the vSwitch that pods should use. When a pod is scheduled to a node with a specific label, it receives an IP address from the custom vSwitch defined for that label.

  1. Create a ConfigMap named eni-config-fixed in the kube-system namespace to specify the dedicated vSwitch.

    This example uses vsw-2zem796p76viir02c****, which corresponds to the 10.2.1.0/24 CIDR block.

    apiVersion: v1
    data:
      eni_conf: |
        {
           "vswitches": {"cn-beijing-h":["vsw-2zem796p76viir02c****"]},
           "security_group": "sg-bp19k3sj8dk3dcd7****",
           "security_groups": ["sg-bp1b39sjf3v49c33****","sg-bp1bpdfg35tg****"]
        }
    kind: ConfigMap
    metadata:
      name: eni-config-fixed
      namespace: kube-system
    
                            
  2. Create a node pool and apply the label terway-config:eni-config-fixed to its nodes. For more information, see Create a node pool.

    To ensure that other pods are not scheduled to nodes in this node pool, you can also apply a taint, such as fixed=true:NoSchedule.节点标签.png

  3. Scale out the node pool. For more information, see Scale a node pool manually.

    New nodes in this node pool automatically inherit the label and taint configured in the previous step.

  4. Create a deployment that schedules pods to nodes with the terway-config:eni-config-fixed label. The pod specification must include a toleration for the taint.

    apiVersion: apps/v1 # For Kubernetes versions earlier than 1.8.0, use apps/v1beta1.
    kind: Deployment
    metadata:
      name: nginx-fixed
      labels:
        app: nginx-fixed
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx-fixed
      template:
        metadata:
          labels:
            app: nginx-fixed
        spec:
          tolerations:        # Add a toleration to match the taint on the node.
          - key: "fixed"
            operator: "Equal"
            value: "true"
            effect: "NoSchedule"
          nodeSelector:
            terway-config: eni-config-fixed
          containers:
          - name: nginx
            image: nginx:1.9.0 # Replace with your actual image: <image_name:tags>.
            ports:
            - containerPort: 80

    Verification

    1. Run the following command to check the pod IP addresses.

      kubectl get po -o wide | grep fixed

      Expected output:

      nginx-fixed-57d4c9bd97-l****                   1/1     Running             0          39s    10.2.1.124    bj-tw.062149.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-t****                   1/1     Running             0          39s    10.2.1.125    bj-tw.062148.aliyun.com   <none>           <none>

      The output shows that the pod IP addresses are assigned from the specified vSwitch.

    2. Run the following command to scale out the deployment to 30 pods.

      kubectl scale deployment nginx-fixed --replicas=30

      Expected output:

      nginx-fixed-57d4c9bd97-2****                   1/1     Running     0          60s     10.2.1.132    bj-tw.062148.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-4****                   1/1     Running     0          60s     10.2.1.144    bj-tw.062149.aliyun.com   <none>           <none>
      nginx-fixed-57d4c9bd97-5****                   1/1     Running     0          60s     10.2.1.143    bj-tw.062148.aliyun.com   <none>           <none>
      ...

      The output shows that all newly created pods are assigned IP addresses from the specified vSwitch. You can now add the CIDR block of this vSwitch to your database's allowlist to grant access to these pods.

Note
  • Use newly created nodes. If you use existing nodes, you must detach their ENIs before adding them to the cluster. When adding the nodes, select the option to automatically add existing nodes and replace their system disks. For more information, see Manage ENIs and Add nodes automatically.

  • Apply labels and taints to the specific node pool to ensure that workloads that do not require allowlisting are not scheduled to these nodes.

  • This allowlisting method uses configuration override. ACK uses the configuration in the specified ConfigMap to override the default eni-config configuration. For more information about the configuration parameters, see Terway dynamic node configuration.

  • Allocate at least twice as many IP addresses in the vSwitch as the expected number of pods. This provides a buffer for future scaling and helps prevent IP address exhaustion if pod IP addresses are not reclaimed immediately after termination.

Pods cannot ping some ECS instances

Symptom

In Flannel network mode, VPN routes appear normal. However, Pods cannot ping some ECS instances.

Cause

  • Cause 1: The ECS instance is in the same VPC as the cluster but in a different security group.

  • Cause 2: The ECS instance is in a different VPC than the cluster.

Solution

  • For Cause 1, add the ECS instance to the security group of the cluster. For more information, see Configure security groups.

  • For Cause 2, Pods must access the ECS instance by using its public IP address. Add an inbound rule to the Security group of the ECS instance that allows traffic from the cluster's public egress IP address.

NodeNetworkUnavailable taint on cluster nodes

Symptom

In a cluster that uses the Flannel network plugin, newly added nodes have the NodeNetworkUnavailable taint, which prevents Pods from being scheduled on them.

Cause

The Cloud Controller Manager fails to promptly remove the taint from the node. This can occur if the route table is full or the VPC contains multiple route tables.

Solution

Run the kubectl describe node command to inspect the node's events and use the resulting error messages to resolve the issue. If your VPC uses multiple route tables, you must manually configure the Cloud Controller Manager to support them. For more details, see Use multiple route tables in a VPC.

Pod startup failure: No IP addresses available

Symptom

In a cluster that uses the Flannel network plugin, Pods fail to start. Inspecting Pod events reveals an error message similar to failed to allocate for range 0: no IP addresses available in range set: 172.30.34.129-172.30.34.190.

Cause

In a cluster that uses the Flannel network plugin, each node is allocated a specific Pod CIDR block. When a Pod is scheduled to a node, Flannel assigns an available IP address from the node's CIDR block. The error message failed to allocate for range 0: no IP addresses available in range set: 172.30.34.129-172.30.34.190 indicates that the node has run out of allocatable IP addresses. This issue is typically caused by an IP address leak, which can occur for the following reasons:

  • For ACK clusters that run a Kubernetes version earlier than 1.20, an IP address leak can occur due to rapid Pod restarts or the termination of short-lived CronJob Pods. For more information, see Issues 75665 and Issues 92614.

  • For clusters that use a Flannel version earlier than v0.15.1.11-7e95fe23-aliyun, an IP address leak can occur if a node reboots or shuts down unexpectedly. This causes Pods to terminate without proper IP address reclamation. For more information, see Issues 332.

Solution

  • To resolve IP address leaks in an ACK cluster running a Kubernetes version earlier than 1.20, upgrade the cluster to version 1.20 or later. For more information, see Manually upgrade a cluster.

  • To resolve IP address leaks caused by an older Flannel version, upgrade the Flannel component to v0.15.1.11-7e95fe23-aliyun or later. Follow these steps:

    In Flannel v0.15.1.11-7e95fe23-aliyun and later, ACK moves the IP allocation database to a temporary filesystem (/var/run). This filesystem is automatically cleared on node reboot, which prevents IP address leaks.

    1. Upgrade the Flannel component to v0.15.1.11-7e95fe23-aliyun or later. For more information, see Manage components.

    2. Run the following command to edit the kube-flannel-cfg file, and then add the dataDir and ipam parameters to the kube-flannel-cfg file.

      kubectl -n kube-system edit cm kube-flannel-cfg

      The following code block shows an example of the kube-flannel-cfg file.

      # Before modification
          {
            "name": "cb0",
            "cniVersion":"0.3.1",
            "plugins": [
              {
                "type": "flannel",
                "delegate": {
                  "isDefaultGateway": true,
                  "hairpinMode": true
                 },
              },
              # portmap # Optional for earlier versions. Ignore if not in use.
              {
                "type": "portmap",
                "capabilities": {
                  "portMappings": true
                },
                "externalSetMarkChain": "KUBE-MARK-MASQ"
              }
            ]
          }
      
      # After modification
          {
            "name": "cb0",
            "cniVersion":"0.3.1",
            "plugins": [
              {
                "type": "flannel",
                "delegate": {
                  "isDefaultGateway": true,
                  "hairpinMode": true
                 },
                # Note the comma.
                "dataDir": "/var/run/cni/flannel",
                "ipam": {
                  "type": "host-local",
                  "dataDir": "/var/run/cni/networks"
                 }
              },
              {
                "type": "portmap",
                "capabilities": {
                  "portMappings": true
                },
                "externalSetMarkChain": "KUBE-MARK-MASQ"
              }
            ]
          }
    3. Run the following command to restart the Flannel Pods.

      Restarting the Flannel Pods does not affect running workloads.

      kubectl -n kube-system delete pod -l app=flannel
    4. Delete the IP allocation directories on each node and restart the node.

      1. Drain the node to evict all existing Pods. For more information, see Drain a node and manage its scheduling state.

      2. Log on to the node and run the following commands to delete the IP allocation directories.

        rm -rf /etc/cni/
        rm -rf /var/lib/cni/
      3. Restart the instance. For more information, see Restart an instance.

      4. Repeat these steps on all affected nodes.

    5. Run the following commands on a node to verify that it is using the temporary filesystem.

      if [ -d /var/lib/cni/networks/cb0 ]; then echo "not using tmpfs"; fi
      if [ -d /var/run/cni/networks/cb0 ]; then echo "using tmpfs"; fi
      cat /etc/cni/net.d/10-flannel.conf*

      If the output is using tmpfs, the node is now using the /var/run temporary filesystem for the IP allocation database, and the change was successful.

  • If you cannot immediately upgrade ACK or Flannel, you can use the following temporary workaround. This workaround applies to leaks from both causes described above.

    This procedure only cleans up existing leaked IP addresses. The underlying issue may recur until you upgrade the Flannel component or the cluster.

    Note
    • These commands do not apply to nodes running Flannel v0.15.1.11-7e95fe23-aliyun or later and have already switched to using /var/run for IP address allocation.

    • The following scripts are for reference only. They may not work correctly on nodes with custom configurations.

    1. Cordon the affected node to prevent scheduling new Pods on it. For more information, see Drain a node and manage its scheduling state.

    2. Use one of the following scripts to clean up leaked IP addresses on the node, depending on your container runtime.

      • If you use the Docker runtime, use the following script to clean up the node.

        #!/bin/bash
        cd /var/lib/cni/networks/cb0;
        docker ps -q > /tmp/running_container_ids
        find /var/lib/cni/networks/cb0 -regex ".*/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" -printf '%f\n' > /tmp/allocated_ips
        for ip in $(cat /tmp/allocated_ips); do
          cid=$(head -1 $ip | sed 's/\r#g' | cut -c-12)
          grep $cid /tmp/running_container_ids > /dev/null || (echo removing leaked ip $ip && rm $ip)
        done
      • If you use the containerd runtime, use the following script to clean up the node.

        #!/bin/bash
        # install jq
        yum install -y jq
        
        # export all running pod's configs
        crictl -r /run/containerd/containerd.sock pods -s ready -q | xargs -n1 crictl -r /run/containerd/containerd.sock inspectp > /tmp/flannel_ip_gc_all_pods
        
        # export and sort pod ip
        cat /tmp/flannel_ip_gc_all_pods | jq -r '.info.cniResult.Interfaces.eth0.IPConfigs[0].IP' | sort > /tmp/flannel_ip_gc_all_pods_ips
        
        # export flannel's all allocated pod ip
        ls -alh /var/lib/cni/networks/cb0/1* | cut -f7 -d"/" | sort > /tmp/flannel_ip_gc_all_allocated_pod_ips
        
        # print leaked pod ip
        comm -13 /tmp/flannel_ip_gc_all_pods_ips /tmp/flannel_ip_gc_all_allocated_pod_ips > /tmp/flannel_ip_gc_leaked_pod_ip
        
        # clean leaked pod ip
        echo "Found $(cat /tmp/flannel_ip_gc_leaked_pod_ip | wc -l) leaked Pod IP, press <Enter> to clean."
        read sure
        
        # delete leaked pod ip
        for pod_ip in $(cat /tmp/flannel_ip_gc_leaked_pod_ip); do
            rm /var/lib/cni/networks/cb0/${pod_ip}
        done
        
        echo "Leaked Pod IP cleaned, removing temp file."
        rm /tmp/flannel_ip_gc_all_pods_ips /tmp/flannel_ip_gc_all_pods /tmp/flannel_ip_gc_leaked_pod_ip /tmp/flannel_ip_gc_all_allocated_pod_ips
    3. Uncordon the node to make it schedulable again. For more information, see Drain a node and manage its scheduling state.

Immutable network settings

IPs per node, Pod CIDR, and Service CIDR are immutable after cluster creation. Plan your subnets carefully before you create the cluster.

When to configure multiple route tables

In Flannel network mode, you must configure the cloud-controller-manager to support multiple route tables in certain scenarios. For configuration instructions, see Use multiple route tables in a VPC.

Scenarios

  • Scenario 1

    System diagnostics reports that a node's Pod CIDR is not in the VPC's route table. To fix this, add a route entry for the Pod CIDR to a custom route table, with the node as the next hop.

    Cause: When you create a custom route table in the cluster, the CCM must be configured to support multiple route tables.

  • Scenario 2

    The cloud-controller-manager component reports the multiple route tables found error.

    Cause: If multiple route tables exist in the cluster, the CCM must be configured to support them.

  • Scenario 3

    In Flannel network mode, newly added nodes have the NodeNetworkUnavailable taint, and the cloud-controller-manager component fails to remove this taint promptly. This prevents Pods from being scheduled to the nodes. For more information, see Why does a cluster node have a NodeNetworkUnavailable taint?.

Third-party network plugins

No. ACK clusters do not support third-party network plugins. Installing one may disrupt the cluster network.

Pod CIDR exhaustion: the no IP addresses available in range set error

This error occurs in ACK clusters that use the Flannel network plugin. Flannel assigns each node a fixed range of IP addresses from the Pod CIDR. Once a node exhausts its assigned IP addresses, new Pods cannot be created on it. To resolve this, free up some IP addresses or recreate the cluster. For more information on planning your cluster network, see Plan the network for an ACK managed cluster.

Pod capacity in Terway network mode

The pod capacity of a cluster using the Terway network mode depends on the number of IP addresses its underlying ECS instances can support. For more information, see Use the Terway network plugin.

Terway DataPath V2 data plane mode

  • Beginning with Terway v1.8.0, new clusters created with the IPvlan option use DataPath V2 mode by default. Existing clusters that use IPvlan continue to use the legacy IPvlan mode.

  • DataPath V2 is a next-generation data plane that offers improved compatibility compared to the legacy IPvlan mode. For more information, see Use the Terway network plugin.

Terway network Pod lifecycle

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > DaemonSets.

  3. At the top of the DaemonSets page, click image and select the kube-system namespace.

  4. On the DaemonSets page, search for terway-eniip and click terway-eniip in the Name column.

    The following table describes the Pod statuses.

    Type

    Description

    Ready

    All containers in the Pod are running.

    Pending

    The Pod is waiting for Terway to configure its network resources.

    The Pod has not been scheduled to a node due to insufficient node resources. For more information, see Troubleshoot Pod exceptions.

    ContainerCreating

    The Pod has been scheduled to a node and is waiting for network initialization to complete.

    For more information, see Pod Lifecycle.

Terway upgrade issues

Symptoms

Solution

The error code eip pool is not supported is returned during the upgrade.

The Terway component no longer supports the EIP feature. To continue using this feature, see Migrate EIPs from Terway to ack-extend-network-controller.

Terway: Pod creation fails with 'MAC address not found' error

Symptom

Pod creation fails with a "MAC address not found" error.

 failed to do add; error parse config, can't found dev by mac 00:16:3e:xx:xx:xx: not found

Solution

  1. This error can occur because an elastic network interface loads asynchronously, and the CNI plugin may attempt network configuration before the interface is ready. However, since the CNI plugin automatically retries the operation, the issue is typically transient. Verify that the Pod was created successfully by checking its final status.

  2. If Pod creation continues to fail and this error persists, the driver likely failed to load due to insufficient high-order memory when the elastic network interface was attached. To resolve this, restart the instance.

Configure a cluster domain

The default cluster domain for an ACK cluster is cluster.local. You can also specify a custom cluster domain when you create a cluster. Note the following:

  • The cluster domain can be configured only during cluster creation and cannot be modified afterward.

  • The cluster domain is the top-level domain for in-cluster Service names, creating an isolated DNS zone for internal services. To prevent DNS resolution conflicts, the cluster domain must not overlap with external private or public DNS zones.

    How it works

    ACK uses CoreDNS as its default DNS server. If a custom cluster domain is specified, the default CoreDNS Corefile is configured as follows:

      Corefile: |
        .:53 {
            errors
            log
            health {
               lameduck 15s
            }
            ready
            kubernetes {{.ClusterDomain}} in-addr.arpa ip6.arpa {
              pods insecure
              fallthrough in-addr.arpa ip6.arpa
              ttl 30
            }
            ...
            forward . /etc/resolv.conf {
              prefer_udp
            }
            ...
          }

    If you do not specify a custom cluster domain, the Corefile configuration is similar to the following:

            kubernetes cluster.local in-addr.arpa ip6.arpa {
              pods insecure
              fallthrough in-addr.arpa ip6.arpa
              ttl 30
            }
            ...
            forward . /etc/resolv.conf {
              prefer_udp
            }

    When CoreDNS handles DNS resolution for the cluster domain, it does not forward these requests to upstream DNS servers by default. This behavior prevents a fallthrough to the forward plugin, ensuring that in-cluster DNS queries are efficient, secure, and isolated from external network issues. If an in-cluster DNS request is mistakenly forwarded to an upstream DNS server, it can trigger a recursive query chain. This long chain can cause the DNS resolution request to exceed its timeout threshold and fail.

    If a custom cluster domain overlaps with an external public top-level domain or a domain defined in Alibaba Cloud DNS PrivateZone, and all Pods in the cluster use CoreDNS as their DNS server, CoreDNS will not forward resolution requests for that domain to an upstream DNS server. Consequently, external domain names within that zone cannot be resolved.

    Furthermore, if the cluster domain overlaps with an external public top-level domain or a domain in an Alibaba Cloud DNS PrivateZone, and a Service name within the cluster also matches a subdomain in that external zone, CoreDNS prioritizes resolving the name to the internal Service IP address instead of the external domain's address. This causes incorrect routing and prevents access to the external service.