Why Your Kubernetes DNS Is Slow Even When CoreDNS Is Healthy?
Troubleshoot slow Kubernetes DNS beyond CoreDNS. Learn how ndots, NodeLocal DNSCache, conntrack, and upstream DNS impact performance.
DNS problems have a frustrating way of appearing out of nowhere. One day, everything is running smoothly. Next, applications begin timing out, internal service calls feel sluggish, and external API requests suddenly take far longer than they should. Before long, developers start reporting that "Kubernetes networking is slow."
Like most engineers, you probably open Grafana or Prometheus and head straight for CoreDNS. The pods are healthy, CPU usage looks normal, memory isn't under pressure, and the logs show no obvious errors. From every metric available, CoreDNS appears to be doing exactly what it should.
So why is every application still waiting on DNS?
The answer is that, in many Kubernetes clusters, CoreDNS isn't actually the bottleneck. DNS resolution involves far more than a single DNS server, and latency can be introduced long before a request reaches CoreDNS or long after it leaves. Focusing only on CoreDNS often causes engineers to overlook the real source of the problem.

Everybody Blames CoreDNS
It's easy to see why CoreDNS gets blamed first. It's the DNS server running inside your cluster, and every pod depends on it to resolve service names.
The problem is that CoreDNS is only one stop in a much longer journey.
A typical Kubernetes DNS lookup follows this path:
<pre><code>Application
│
▼
Pod Resolver (glibc)
│
▼
Search Domains
│
▼
ndots Processing
│
▼
NodeLocal DNSCache (optional)
│
▼
CoreDNS
│
▼
Upstream DNS Resolver
│
▼
Authoritative DNS Server</code></pre>Every layer in this chain has the potential to introduce latency. Some generate additional DNS queries, others add network hops, and some depend on infrastructure completely outside your Kubernetes cluster. By the time CoreDNS receives a request, the application may already have spent valuable milliseconds elsewhere.
Understanding this request path is the foundation for troubleshooting Kubernetes DNS properly.
Understanding the Kubernetes DNS Request Path
Before diving into individual bottlenecks, it's worth understanding what actually happens when an application performs a DNS lookup.
The process starts inside the application. Whether it's a Go service, a Java application, or a Python API, the program simply asks the operating system to resolve a hostname. From there, the request is handed to the system's resolver library (typically glibc on Linux), which reads the DNS configuration available inside the pod.
The resolver checks the configured search domains and the ndots setting to decide whether the hostname should be treated as an internal Kubernetes service or an external domain. If NodeLocal DNSCache is enabled, the request is sent there first. Otherwise, it travels directly to CoreDNS.
CoreDNS then determines whether it already knows the answer. Internal Kubernetes services are resolved locally, while external domains are forwarded to an upstream recursive resolver, which may be a cloud DNS service, an enterprise DNS server, or another recursive resolver on your network. Only after the resolver contacts the authoritative DNS server does the answer begin making its way back to the application.
Every step is usually fast on its own.
The problem is that small delays at multiple stages quickly add up, turning what should be a 5-millisecond lookup into something that takes hundreds of milliseconds.
Problem 1: Why One DNS Lookup Quietly Becomes Several
One of Kubernetes' least understood defaults is ndots.
If you inspect the DNS configuration inside a pod, you'll usually see something like this:
search default.svc.cluster.local
svc.cluster.local
cluster.local
options ndots:5At first glance, this looks harmless. In reality, it's responsible for a surprising amount of unnecessary DNS traffic in many clusters.
Imagine your application needs to call GitHub's API.
It requests:
api.github.comMost people assume Kubernetes immediately queries that hostname.
Instead, because ndots:5 tells the resolver to first assume the hostname might belong to the cluster, it tries several internal variations before attempting the actual destination.
api.github.com.default.svc.cluster.local
api.github.com.svc.cluster.local
api.github.com.cluster.local
api.github.comWhat looked like one DNS lookup has now become four.
For workloads that communicate heavily with external APIs, the numbers grow quickly. Imagine a payment service making 5,000 outbound requests every minute. Instead of generating 5,000 DNS lookups, it may generate well over 20,000 before the real hostname is even queried.
This behaviour isn't a Kubernetes bug.
It's an intentional design choice.
Kubernetes optimises DNS for internal service discovery because most workloads communicate with other services inside the cluster. In those scenarios, ndots:5 reduces lookup time by allowing short service names such as payments or orders to resolve without requiring fully qualified domain names.
The downside appears when applications spend most of their time communicating with systems outside Kubernetes. Every external hostname pays the cost of those additional lookup attempts, increasing DNS traffic and adding unnecessary latency.
Lowering the ndots value can improve performance for workloads dominated by external traffic, but it's not something that should be changed blindly. Reducing it too aggressively may negatively affect internal service discovery and introduce unexpected behaviour for applications that rely on Kubernetes' default DNS resolution.
Like most Kubernetes tuning decisions, the correct value depends on the workload, not a universal best practice.
Problem 2: Search Domains Quietly Multiply Every Request
Search domains work hand in hand with ndots.
A typical pod receives entries similar to:
search default.svc.cluster.local
svc.cluster.local
cluster.local
options ndots:5These search domains make internal service discovery simple. Instead of connecting to:
payments.default.svc.cluster.localAn application can simply request:
paymentsThe resolver automatically expands the remaining domain names behind the scenes.
That's extremely convenient for internal communication, but it comes at a cost when resolving external domains.
When an application requests api.github.com, the resolver first combines that hostname with every configured search domain before finally attempting the original hostname. Those failed lookups still consume time, network bandwidth, and processing resources inside CoreDNS.
In smaller clusters, the impact is usually negligible.
In large microservice environments generating millions of DNS requests every day, those extra lookups become a measurable source of latency and unnecessary load.
Problem 3: Linux conntrack Can Become the Real Bottleneck
At this point, it's tempting to keep tweaking CoreDNS, but sometimes DNS isn't the problem at all.
The real bottleneck lives inside the Linux kernel.
Every worker node maintains a connection tracking table, commonly known as conntrack. Its job is to keep track of network connections so the kernel knows where packets belong and how they should be handled. Although DNS usually uses UDP, those requests are still tracked by conntrack.
In a busy Kubernetes cluster, thousands of pods can generate an enormous number of short-lived DNS requests every second. Individually, those requests are tiny. Together, they can quickly consume the available conntrack entries.
Once the table starts filling up, the symptoms become difficult to diagnose. Packets begin waiting for free entries, some are dropped altogether, and applications retry their DNS lookups. From the application's perspective, DNS appears randomly slow. From CoreDNS's perspective, everything looks perfectly normal because many of those packets never reached it in the first place.
The flow looks something like this:
Pods
│
Thousands of DNS Requests
│
▼
Linux conntrack Table
│
▼
Table Full
│
├── Packets Wait
└── Packets Drop
│
▼
Applications Experience Slow DNSThis is why DNS issues often appear inconsistent. One request resolves instantly, while the next takes several hundred milliseconds. Nothing changed in CoreDNS. The network stack underneath Kubernetes became the limiting factor.
If DNS latency appears random rather than constant, conntrack should always be part of your investigation.
Problem 4: NodeLocal DNSCache Removes Unnecessary Network Hops
Many Kubernetes clusters still send every DNS request across the network to the CoreDNS Service. Even if the same hostname was resolved a few milliseconds earlier, another pod has to repeat the entire journey.
That journey looks like this:
Pod
│
▼
CoreDNS Service
│
▼
iptables
│
▼
Network Hop
│
▼
CoreDNSFor a handful of applications, this overhead is barely noticeable. At scale, it becomes another source of unnecessary latency.
This is where NodeLocal DNSCache makes a significant difference.
Instead of forcing every pod to contact CoreDNS directly, Kubernetes runs a lightweight DNS cache on each worker node. Pods send their DNS requests to the local cache first. If the answer already exists, it's returned immediately. Only cache misses are forwarded to CoreDNS.
The request path becomes much shorter.
Pod
│
▼
NodeLocal DNSCache
│
├── Cache Hit → Immediate Response
│
└── Cache Miss
│
▼
CoreDNSThis seemingly small architectural change delivers several benefits:
- Lower DNS latency for repeated lookups
- Fewer network hops
- Reduced CoreDNS CPU utilisation
- Less DNS traffic across the cluster
- Lower risk of UDP packet loss during traffic spikes
It's one of those optimisations that many teams overlook because the cluster works without it. However, once workloads begin scaling, NodeLocal DNSCache often becomes one of the simplest ways to improve DNS performance without adding more CoreDNS replicas.
Problem 5: The Slowest DNS Server Might Be Outside Your Cluster
Even when everything inside Kubernetes is configured correctly, DNS can still be slow.
The reason is simple.
CoreDNS doesn't answer every query itself.
Requests for external domains are forwarded to an upstream recursive resolver, which may be:
- AWS Route 53 Resolver
- Azure DNS
- Google Cloud DNS
- Active Directory
- Corporate DNS
- On-premises DNS infrastructure
- Your ISP's recursive resolver
If that upstream resolver responds slowly, CoreDNS has no option except to wait.
From your monitoring dashboards, CoreDNS appears healthy because it isn't overloaded. It's simply waiting for another DNS server to return an answer.
This is why engineers sometimes restart CoreDNS, see no improvement, and spend hours debugging Kubernetes when the real delay is occurring entirely outside the cluster.
Whenever external DNS lookups become slow, it's important to determine whether the latency is coming from CoreDNS itself or from the resolver it's forwarding requests to.
How to Measure Where DNS Is Slow
Rather than assuming CoreDNS is responsible, work through the DNS resolution path one step at a time. The goal is to identify where the delay begins, not simply confirm that DNS is slow.
Start by testing DNS resolution from inside a running pod.
kubectl exec -it dns-test -- nslookup kubernetes.defaultThis confirms whether the application can resolve Kubernetes services successfully.
If internal lookups work but external domains are slow, query CoreDNS directly.
dig @<COREDNS_IP> api.github.comIf this responds quickly, CoreDNS is probably not your bottleneck.
To understand where a query spends its time, trace the complete DNS resolution path.
dig +trace api.github.comThis shows how the request travels through each DNS server until it reaches the authoritative nameserver.
Next, inspect the CoreDNS configuration.
kubectl describe configmap coredns -n kube-systemThis helps verify forwarding rules, caching behaviour, and any custom plugins that could influence DNS performance.
Review the CoreDNS logs as well.
kubectl logs -n kube-system deployment/corednsTimeouts, forwarding failures, or repeated upstream errors often appear here before they become visible elsewhere.
Finally, confirm that CoreDNS isn't actually resource-constrained.
kubectl top pods -n kube-systemIf CPU and memory usage remain healthy while DNS is still slow, your investigation should shift toward resolver behaviour, conntrack, NodeLocal DNSCache, or upstream DNS servers rather than CoreDNS itself.
Common Kubernetes DNS Bottlenecks
Not every DNS issue has the same root cause. Two clusters may show identical symptoms while requiring completely different fixes. Understanding the relationship between symptoms and their most likely causes can save hours of unnecessary troubleshooting.
| Symptom | Most Likely Cause |
|---|---|
| External DNS lookups are consistently slow | ndots:5 generating unnecessary lookups |
| Random DNS latency across the cluster | Linux conntrack saturation |
| High CoreDNS CPU usage | Excessive DNS requests caused by search domains or ndots |
| Healthy CoreDNS but slow applications | Slow upstream DNS resolver |
| Only certain nodes experience DNS problems | NodeLocal DNSCache missing or misconfigured |
While these patterns won't cover every scenario, they're responsible for a large percentage of DNS performance issues seen in production Kubernetes clusters.
Best Practices for Faster Kubernetes DNS
DNS performance rarely improves because of a single configuration change. More often, it's the result of removing small inefficiencies throughout the resolution path.
A few best practices consistently make a noticeable difference:
- Enable NodeLocal DNSCache on production clusters with moderate to high traffic.
- Review whether the default
ndots:5setting matches your workload. Applications that mostly communicate with external services may benefit from a lower value. - Keep search domains to the minimum required for service discovery.
- Monitor DNS latency independently instead of relying solely on CoreDNS health metrics.
- Watch Linux conntrack usage, especially on busy worker nodes.
- Benchmark upstream DNS resolvers regularly. A healthy CoreDNS deployment can't compensate for a slow recursive resolver.
- Cache frequently resolved hostnames at the application layer whenever possible.
None of these changes is particularly complicated, but together they can significantly reduce DNS latency and lower the load placed on CoreDNS.
Why AI-Generated Kubernetes YAML Doesn't Fix DNS Performance
AI has become incredibly good at generating Kubernetes manifests. Give it a prompt, and it can produce Deployments, Services, Ingress resources, Helm charts, or even complete application stacks within seconds.
What it usually doesn't do is ask questions about the environment those workloads will run in.
It won't ask whether your cluster uses NodeLocal DNSCache. It won't warn you that ndots:5 could multiply external DNS requests or that your worker nodes are approaching their conntrack limits. It won't tell you that CoreDNS is forwarding requests to an overloaded corporate DNS server halfway across the network.
Those aren't YAML problems.
They're platform engineering problems.
AI can help deploy applications faster, but understanding how those applications interact with the underlying infrastructure is still the responsibility of the engineer. Fast deployments mean very little if every service spends hundreds of milliseconds waiting for DNS before it can process a request.
As AI becomes a bigger part of cloud-native workflows, this distinction becomes even more important. The tools can automate deployments, but they can't replace a solid understanding of how Kubernetes behaves under production workloads.
Conclusion
When DNS performance starts degrading, CoreDNS is usually the first component engineers investigate. That's a logical starting point, but it shouldn't be the last.
A Kubernetes DNS request travels through the application's resolver, search domains, ndots processing, optional local caches, Linux networking, CoreDNS, upstream recursive resolvers, and finally the authoritative DNS server. Latency introduced at any stage eventually shows up as "slow Kubernetes DNS," even if CoreDNS itself is operating perfectly.
The next time your applications begin waiting on DNS, resist the urge to immediately restart CoreDNS or increase its replica count. Instead, trace the request from the application all the way through the resolution path. More often than not, you'll discover that CoreDNS was simply waiting on something else.
Understanding that distinction doesn't just help you fix DNS issues faster. It helps you troubleshoot Kubernetes the way experienced platform engineers do by following the entire request path instead of assuming the most visible component is the one at fault.
"The fastest DNS server isn't always the healthiest one. It's the one your applications barely have to ask."