DNS is a critical service in Kubernetes clusters. Under certain conditions, such as improper client configurations or in large-scale clusters, DNS can experience resolution timeouts and failures. This topic provides best practices for DNS in Kubernetes clusters to help you avoid these issues.
Notes
This topic does not apply to the managed edition of CoreDNS or ACK clusters that have Auto Mode enabled. The managed edition of CoreDNS automatically scales based on load, requiring no manual adjustment.
In this topic
DNS best practices cover both client-side and server-side optimizations:
-
On the client side, you can optimize domain name resolution requests to reduce latency and minimize resolution failures by using appropriate container images, node operating systems, and NodeLocal DNSCache.
-
On the CoreDNS server side, you can identify DNS exceptions and quickly locate their root causes by monitoring the CoreDNS operational status. You can also adjust the CoreDNS deployment to improve high availability and queries per second (QPS) throughput.
For more information about CoreDNS, see the official CoreDNS documentation.
Optimize domain name resolution requests
DNS resolution is one of the most frequent network operations in a Kubernetes cluster. Many of these requests can be optimized or avoided to reduce latency and load on the DNS infrastructure. You can optimize domain name resolution requests in the following ways:
-
(Recommended) Use connection pools: When a containerized application frequently requests another service, use a connection pool to cache active connections to upstream services in memory. This eliminates the overhead of DNS resolution and TCP handshakes for each request.
-
Use an asynchronous or long-polling mode to obtain the IP addresses for a domain name.
-
Use DNS caching:
-
(Recommended) If your application cannot be modified to use a connection pool, consider caching DNS resolution results on the application side. For more information, see Use NodeLocal DNSCache.
-
If you cannot use NodeLocal DNSCache, you can use the built-in Name Service Cache Daemon (NSCD) cache in your containers. For more information, see Use NSCD in Kubernetes clusters.
-
-
Optimize the resolv.conf file: Because of the mechanisms of the ndots and search parameters in the resolv.conf file, the way you write domain names in a container determines the efficiency of domain name resolution. For more information about the mechanisms of the ndots and search parameters, see DNS Policy Configuration and Domain Name Resolution.
-
Optimize domain name configurations: When an application in a container accesses a domain name, configure it as follows to minimize resolution attempts and reduce resolution latency.
-
To access a Service in the same namespace from a pod, use
<service-name>, whereservice-nameis the name of the Service. -
To access a Service in a different namespace from a pod, use
<service-name>.<namespace-name>, wherenamespace-nameis the namespace where the Service resides. -
When a pod accesses an external domain name, use a Fully Qualified Domain Name (FQDN), which ends in a trailing dot (.), to prevent multiple invalid DNS lookups caused by appending domains from the
searchlist. For example, to access www.aliyun.com, use its FQDN www.aliyun.com..-
In clusters that run Kubernetes 1.22 or later, you can configure the search domain as a single period (.) to achieve a similar effect (see Issue 125883):
dnsPolicy: None dnsConfig: nameservers: ["192.168.0.10"] ## Replace with the actual ClusterIP of your CoreDNS Service. searches: - . - default.svc.cluster.local ## Note: Replace default with the actual namespace. - svc.cluster.local - cluster.localAfter you apply the preceding configuration, the /etc/resolv.conf file in the pod is configured as follows:
search . default.svc.cluster.local svc.cluster.local cluster.local nameserver 192.168.0.10The first search domain is ".", which makes the resolver treat the target domain as an FQDN. The resolver first attempts to resolve the domain name as-is, skipping unnecessary search domain expansions.
ImportantYou must set
dnsPolicytoNonefor the preceding configuration to take effect.
-
-
Understand DNS configurations in containers
-
Different DNS resolvers may behave differently due to implementation variations. You might encounter cases where
dig <domain>succeeds butping <domain>fails. -
Avoid using Alpine as the base image. Use other base images, such as Debian or CentOS, instead. The
musl libclibrary built into Alpine container images has several implementation differences compared to the standard glibc, which can lead to issues that include but are not limited to the following:-
TCP fallback: Alpine 3.18 and earlier do not support fallback to TCP when a truncated (TC) flag is returned.
-
Search domains: Alpine 3.3 and earlier do not support the search parameter, which breaks service discovery.
-
Optimization conflicts: Alpine concurrently queries all DNS servers that are configured in /etc/resolv.conf, which can bypass and invalidate NodeLocal DNSCache optimizations.
-
Conntrack race conditions: Concurrent A and AAAA record requests that use the same socket can trigger conntrack source port conflicts in older Linux kernels, which results in packet loss.
For more information about these issues, see musl libc.
-
-
If you use a Go application, be aware of the differences between the DNS resolvers in the CGO and Pure GO implementations.
Avoid DNS timeouts caused by IPVS defects
When a cluster uses IPVS as the kube-proxy load balancing mode, you may encounter probabilistic DNS resolution timeouts when CoreDNS is scaled down or restarted. This issue is caused by a defect in the community Linux kernel. For more information, see IPVS.
You can use one of the following methods to mitigate the impact of the IPVS defect:
-
Use the NodeLocal DNSCache. For more information, see Use NodeLocal DNSCache.
-
Modify the timeout period for IPVS UDP session persistence in kube-proxy. For more information, see How do I modify the timeout period for IPVS UDP session persistence in kube-proxy?.
Use NodeLocal DNSCache
CoreDNS may experience the following issues:
-
In rare cases, concurrent A and AAAA queries can cause packet loss, which leads to DNS resolution failures.
-
A full
conntracktable on a node can cause packet loss, which leads to DNS resolution failures.
To improve DNS stability and performance in your cluster, install the NodeLocal DNSCache component. It enhances cluster DNS performance by running a DNS cache on each cluster node. For more information about NodeLocal DNSCache and how to deploy it in an ACK cluster, see Use the NodeLocal DNSCache component.
After you install NodeLocal DNSCache, you must inject the DNS cache configuration into your pods. You can run the following command to add a label to a specific namespace. New pods created in this namespace will automatically have the DNS cache configuration injected. For more information about other injection methods, see the documentation referenced in the previous paragraph.
kubectl label namespace default node-local-dns-injection=enabled
Use a suitable CoreDNS version
CoreDNS offers good backward compatibility with Kubernetes versions. Keep CoreDNS updated to the latest stable version. The Add-ons page in the ACK console allows you to install, upgrade, and configure CoreDNS. Check the status of the CoreDNS component on the Add-ons page. If an upgrade is available, schedule the upgrade during off-peak hours.
-
For more information about how to upgrade CoreDNS, see Automatic upgrade for unmanaged CoreDNS.
-
For the release notes of CoreDNS, see CoreDNS.
CoreDNS versions earlier than v1.7.0 have several potential risks, including:
-
When connectivity between CoreDNS and the API server is abnormal, for example, due to API server restarts, migrations, or network jitter, CoreDNS may restart because it fails to write error logs. For more information, see Set klog's logtostderr flag.
-
CoreDNS consumes extra memory at startup. The default memory limit may trigger out-of-memory (OOM) issues in large-scale clusters. In severe cases, this can cause CoreDNS pods to enter a restart loop and fail to recover. For more information, see CoreDNS uses a lot memory during initialization phase.
-
CoreDNS has several issues that can affect the resolution of headless Service domain names and domain names outside the cluster. For more information, see plugin/kubernetes: handle tombstones in default processor and Data is not synced when CoreDNS reconnects to kubernetes api server after protracted disconnection.
-
If a cluster node becomes abnormal, the default toleration policy in some earlier CoreDNS versions may cause CoreDNS pods to be scheduled onto the abnormal node. These pods cannot be automatically evicted, leading to DNS resolution failures.
The recommended minimum CoreDNS version varies depending on the Kubernetes version of the cluster.
|
Cluster version |
Minimum CoreDNS version |
|
Earlier than 1.14.8 |
v1.6.2 (End of Life) |
|
1.14.8 or later, but earlier than 1.20.4 |
v1.7.0.0-f59c03d-aliyun |
|
1.20.4 or later, but earlier than 1.21.0 |
v1.8.4.1-3a376cc-aliyun |
|
1.21.0 and later |
v1.11.3.2-f57ea7ed6-aliyun |
Monitor the operational status of CoreDNS
Metrics
CoreDNS exposes health metrics, including resolution results, through a standard Prometheus interface. This helps detect anomalies on the CoreDNS server and even upstream DNS servers.
Managed Service for Prometheus provides built-in metrics monitoring dashboards and alerting rules for CoreDNS. You can enable Prometheus and its dashboard features in the ACK console. For more information, see Monitor the CoreDNS component.
If you use a self-managed Prometheus instance to monitor your Kubernetes cluster, you can observe the relevant metrics in Prometheus and set up alerts for key indicators. For more information, see the official CoreDNS documentation for Prometheus.
Logs
In the event of a DNS anomaly, CoreDNS logs can help you quickly diagnose the root cause. We recommend that you enable CoreDNS domain name resolution logging and collect its logs with Log Service. For more information, see Analyze and monitor CoreDNS logs.
Kubernetes event delivery
In CoreDNS v1.9.3.6-32932850-aliyun and later, you can enable the k8s_event plugin to deliver critical CoreDNS logs as Kubernetes events to the Event Center. For more information about the k8s_event plugin, see k8s_event.
This feature is enabled by default in new CoreDNS deployments. If you upgrade from an earlier version to CoreDNS v1.9.3.6-32932850-aliyun or later, you need to manually modify the configuration file to enable it.
-
Run the following command to open the CoreDNS configuration file.
kubectl -n kube-system edit configmap/coredns -
Add the kubeapi and k8s_event plugins.
apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 15s } // Start of addition (ignore other differences). kubeapi k8s_event { level info error warning // Deliver critical logs with info, error, and warning statuses. } // End of addition. kubernetes cluster.local in-addr.arpa ip6.arpa { pods verified fallthrough in-addr.arpa ip6.arpa } // ... (remaining content omitted) } -
Check the operational status and logs of the CoreDNS pods. If the logs contain the word
reload, the modification is successful.
Ensure CoreDNS high availability
CoreDNS is the authoritative DNS for the cluster. A failure in CoreDNS can cause Service access within the cluster to fail, potentially leading to widespread service unavailability. You can take the following measures to ensure the high availability of CoreDNS:
Assess CoreDNS component pressure
You can perform a DNS stress test in the cluster to assess component pressure. Many open-source tools, including DNSPerf, can help you with this. If you cannot accurately assess the DNS pressure in your cluster, follow these recommendations.
-
Always set the number of CoreDNS pods to at least 2, with a resource limit of at least 1 core and 1 GiB for a single pod.
-
CoreDNS's domain name resolution QPS is positively correlated with its CPU consumption. With NodeLocal DNSCache enabled, each CPU core can support over 10,000 QPS. The QPS demand for domain name requests varies significantly across different types of services. You can observe the peak CPU usage of each CoreDNS pod. If a pod uses more than one CPU core during peak hours, we recommend that you scale out the CoreDNS replicas. If you cannot determine the peak CPU usage, you can conservatively use a 1:8 ratio of pods to cluster nodes. That is, for every 8 cluster nodes that you add, add one CoreDNS pod.
Adjust CoreDNS pod count
The number of CoreDNS pods directly determines the computing resources that CoreDNS can use. You can adjust the number of CoreDNS pods based on your assessment.
Due to the lack of a retransmission mechanism in UDP packets, if there is a risk of packet loss on cluster nodes due to the IPVS UDP defect, scaling in or restarting CoreDNS pods can cause cluster-wide DNS resolution timeouts or exceptions for up to five minutes. For solutions to resolution exceptions that are caused by the IPVS defect, see Troubleshoot DNS resolution issues.
-
Automatically adjust based on the recommended policy
You can deploy the following
dns-autoscaler. It automatically adjusts the number of CoreDNS pods in real time based on the recommended policy (a 1:8 ratio of pods to cluster nodes). The number of pods is calculated by using the following formula: replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica)), and is limited by themaxandminparameters. -
Manually adjust
You can run the following command to manually adjust the number of CoreDNS pods.
kubectl scale --replicas={target} deployment/coredns -n kube-system # Replace {target} with the desired number of pods. -
Do not use workload auto-scaling
Although workload auto-scaling features like Horizontal Pod Autoscaler (HPA) and CronHPA can also automatically adjust the number of pods, they perform frequent scaling operations. Due to the resolution exceptions that occur when pods are scaled in, do not use workload auto-scaling to control the number of CoreDNS pods.
Adjust CoreDNS pod specifications
Another way to adjust CoreDNS resources is to modify pod specifications. In an ACK managed Pro cluster, the default memory limit for CoreDNS pods is 2Gi, with no CPU limit. Set the CPU limit to 4096m, with a minimum of 1024m. You can adjust the CoreDNS pod configuration in the console.
Schedule CoreDNS pods
An incorrect scheduling configuration may prevent CoreDNS pods from being deployed, leading to CoreDNS failure. Before you perform this operation, make sure that you are familiar with scheduling.
We recommend that you deploy CoreDNS pods across different availability zones and cluster nodes to avoid single-node or single-availability-zone failures. CoreDNS component versions earlier than v1.8.4.3 have a default soft anti-affinity policy at the node level, which may cause some or all pods to be deployed on the same node if resources are insufficient. If this occurs, delete the pods to trigger rescheduling, or upgrade the component to the latest version. CoreDNS component versions earlier than v1.8 are no longer maintained and should be upgraded as soon as possible.
The cluster nodes where CoreDNS runs should not have their CPU or memory fully utilized, because this affects the QPS and response latency of domain name resolution. When cluster node conditions permit, consider using custom parameters to schedule CoreDNS to dedicated cluster nodes to provide a stable domain name resolution service.
Optimize CoreDNS configurations
ACK provides a default configuration for CoreDNS. You should review and optimize these parameters to ensure that CoreDNS can provide proper DNS services for your business containers. CoreDNS configuration is highly flexible. For more information, see Configure DNS policies and resolve domain names and the official CoreDNS documentation.
The default CoreDNS configurations deployed with earlier Kubernetes cluster versions may have some risks. Check and optimize them as follows:
You can also use the scheduled inspection and fault diagnosis features of Container Intelligence Service to check CoreDNS configuration files. If the inspection result from Container Intelligence Service indicates a CoreDNS ConfigMap configuration exception, check each of the preceding items.
CoreDNS may consume extra memory when it refreshes its configuration. After you modify a CoreDNS ConfigMap, observe the pod status. If a pod runs out of memory, promptly increase the container memory limit in the CoreDNS Deployment. Adjust the memory limit to 2 GB.
Disable session affinity for kube-dns
Session affinity can lead to significant load imbalances between CoreDNS replicas. Disable it by following these steps:
Console
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
In the kube-system namespace, click Edit YAML to the right of the kube-dns Service.
-
If the sessionAffinity field is set to
None, no further action is needed. -
If the sessionAffinity field is set to
ClientIP, proceed with the following steps.
-
-
Delete the sessionAffinity and sessionAffinityConfig fields and all their sub-keys, and then click Update.
# Delete all of the following content. sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800 -
Click Edit YAML to the right of the kube-dns service again and verify that the sessionAffinity field is set to
None. A value ofNoneindicates that the Kube-DNS service is successfully modified.
CLI
-
Run the following command to view the configuration information of the kube-dns Service.
kubectl -n kube-system get svc kube-dns -o yaml-
If the sessionAffinity field is set to
None, no further action is needed. -
If the sessionAffinity field is set to
ClientIP, proceed with the following steps.
-
-
Run the following command to open and edit the Service named kube-dns.
kubectl -n kube-system edit service kube-dns -
Delete the sessionAffinity-related settings (sessionAffinity, sessionAffinityConfig, and all their sub-keys), and then save and exit.
# Delete all of the following content. sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800 -
After the modification is complete, run the following command again to check if the sessionAffinity field is set to
None. If the value isNone, the change to the Kube-DNS service is successful.kubectl -n kube-system get svc kube-dns -o yaml
Disable the autopath plugin
Some earlier versions of CoreDNS enabled the autopath plugin, which can cause resolution errors in some edge cases. Check if it is enabled and edit the configuration file to disable it. For more information, see Autopath.
After you disable the autopath plugin, the client-side QPS can increase by up to three times and the time taken to resolve a single domain name can also increase by up to three times. Monitor the CoreDNS load and business impact.
-
Run the
kubectl -n kube-system edit configmap corednscommand to open the CoreDNS configuration file. -
Delete the
autopath @kubernetesline and save the file. -
Check the operational status and logs of the CoreDNS pods. If the logs contain the word
reload, the modification is successful.
Configure graceful shutdown
lameduck is a mechanism in CoreDNS that enables graceful shutdown. It ensures that when CoreDNS needs to stop or restart, ongoing requests are completed without being abruptly interrupted. lameduck works as follows:
-
When a CoreDNS process is about to terminate, it enters Lameduck mode.
-
In
lameduckmode, CoreDNS stops accepting new requests but continues to process existing requests until they are all completed or thelameducktimeout period is exceeded.
Console
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
In the kube-system namespace, click Edit YAML to the right of the coredns ConfigMap.
-
In the CoreDNS configuration file, ensure that the health plugin is enabled and set the lameduck timeout to
15s. Then, click OK.
.:53 {
errors
# The health plugin may have different settings in different CoreDNS versions.
# Scenario 1: The health plugin is not enabled by default.
# Scenario 2: The health plugin is enabled, but no lameduck duration is set.
# health
# Scenario 3: The health plugin is enabled, and the lameduck duration is set to 5s.
# health {
# lameduck 5s
# }
# For all three scenarios, modify the configuration as follows to set the lameduck parameter to 15s.
health {
lameduck 15s
}
# Other plugins do not need to be modified and are omitted here.
}
If the CoreDNS pods run normally, the change was successful. If a CoreDNS pod becomes abnormal, you can identify the cause by viewing its events and logs.
CLI
-
Run the following command to open the CoreDNS configuration file.
-
In the Corefile, ensure that the
healthplugin is enabled and set the lameduck parameter to15s. -
Save and exit after you modify the CoreDNS configuration file.
-
If CoreDNS runs normally, the change was successful. If a CoreDNS pod becomes abnormal, you can identify the cause by viewing its events and logs.
kubectl -n kube-system edit configmap/coredns
.:53 {
errors
# The health plugin may have different settings in different CoreDNS versions.
# Scenario 1: The health plugin is not enabled by default.
# Scenario 2: The health plugin is enabled, but no lameduck duration is set.
# health
# Scenario 3: The health plugin is enabled, and the lameduck duration is set to 5s.
# health {
# lameduck 5s
# }
# For all three scenarios, modify the configuration as follows to set the lameduck parameter to 15s.
health {
lameduck 15s
}
# Other plugins do not need to be modified and are omitted here.
}
Configure default protocol for the forward plugin
NodeLocal DNSCache uses TCP to communicate with CoreDNS. CoreDNS then communicates with upstream DNS servers by using the same protocol as the incoming request. Therefore, by default, requests from business containers to resolve domain names outside the cluster pass through NodeLocal DNSCache and CoreDNS, and finally reach the VPC DNS servers (by default, 100.100.2.136 and 100.100.2.138 on ECS instances) over TCP.
VPC DNS servers have limited support for TCP. If you use NodeLocal DNSCache, you need to modify the CoreDNS configuration to prioritize UDP for communication with upstream DNS servers to avoid resolution exceptions. We recommend that you modify the CoreDNS configuration file, which is the ConfigMap named coredns in the kube-system namespace. For more information, see Manage ConfigMaps. In the forward plugin, specify the protocol for upstream requests as prefer_udp. After this modification, CoreDNS prioritizes UDP to communicate with upstream servers. The modification is as follows:
# Before modification
forward . /etc/resolv.conf
# After modification
forward . /etc/resolv.conf {
prefer_udp
}
Configure the ready plugin
CoreDNS versions later than 1.5.0 must have the ready plugin configured to enable readiness probes.
-
Run the following command to open the CoreDNS configuration file.
kubectl -n kube-system edit configmap/coredns -
Check if the file contains the
readyline. If not, add thereadyline, press Esc, enter:wq!, and then press Enter to save the modified configuration file and exit edit mode.apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 15s } ready # If this line does not exist, add it. Make sure that the indentation is consistent with Kubernetes. kubernetes cluster.local in-addr.arpa ip6.arpa { pods verified fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 prefer_udp } cache 30 loop log reload loadbalance } -
Check the operational status and logs of the CoreDNS pods. If the logs contain the word
reload, the modification is successful.
Enhance performance with the multisocket plugin
CoreDNS v1.12.1 introduced the multisocket plugin. Enabling this plugin allows CoreDNS to use multiple sockets to listen on the same port, enhancing CoreDNS performance in high-CPU scenarios. For a detailed description of the plugin, see the community documentation.
You need to enable multisocket by using the coredns ConfigMap:
.:53 {
...
prometheus :9153
multisocket [NUM_SOCKETS]
forward . /etc/resolv.conf
...
}
NUM_SOCKETS specifies the number of sockets that listen on the same port.
Recommended configuration: Align NUM_SOCKETS with the estimated CPU utilization, CPU resource limits, and available cluster resources. For example:
-
If CoreDNS consumes 4 cores at peak and 8 cores are available, set
NUM_SOCKETSto 2. -
If CoreDNS consumes 8 cores at peak and 64 cores are available, set
NUM_SOCKETSto 8.
To determine the optimal configuration, we recommend that you test the QPS and load with different settings.
If you do not specify NUM_SOCKETS, the default value is GOMAXPROCS, which is equal to the CPU limit of the CoreDNS pod. If the pod's CPU limit is not set, the value is equal to the number of CPU cores on the node where the pod is running.