CoreDNS configured incorrectly in LKE beta?
I've been bashing my head on a number of issues trying out the LKE beta. It seems like DNS is once again the root cause… twice… once being my fault and the second time (the point of this question) looks like it may be a configuration issue in LKE
As an aside, it would be great if you could publish a guide on setting up LKE with an ingress (e.g. nginx-ingress or traefik), external-dns so that you can configure the external hostnames and certmanager with ACME via LetsEncrypt. This would seem to be a basic minimum that most people using LKE for hosting would want. (Also is there anything you can do to speed up your hosted DNS update… 30 minutes is a long time to wait to see a new hostname)
Anyway, after much diagnosis (and finally trying helm upgrade --namespace nginx nginx stable/nginx-ingress --set controller.publishService.enabled=true
) I was able to get the Ingress external IP to be that of the LoadBalancer assigned to the nginx-ingress and that unblocked the biggest stumbling block I had with cert-manager… namely that it was failing the self-check…
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://...redacted.../.well-known/acme-challenge/bD...redacted...j0': Get http://...redacted.../.well-known/acme-challenge/bD...redacted...j0: dial tcp: lookup ...redacted... on 10.128.0.10:53: no such host
State: pending
Given that the external-dns had been pointing the hostname at the Nodes rather than the LoadBalancer, I had believed the issue to be DNS… (plus it's always DNS)
When I configured Nginx Ingress to publish the service end-point the DNS entries were changed to the correct IP… of course then I have to wait for Linode's DNS servers to update (30 minutes WAT!)
Same error!!!
So I say, let's go digging DNS:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
followed by a kubectl apply
and then:
$ kubectl exec -ti dnsutils -- nslookup kubernetes.default
Server: 10.128.0.10
Address: 10.128.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.128.0.1
That looks ok… This however…
$ kubectl exec -ti dnsutils -- nslookup ...redacted...
Server: 10.128.0.10
Address: 10.128.0.10#53
** server can't find ...redacted...: NXDOMAIN
command terminated with exit code 1
That's not great…
So I tried to brute force it and changes the CoreDNS configmap from
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
To
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
Restarted the coredns pods and w00t DNS was working… but this feels wrong.
Why was DNS not working with forward . /etc/resolv.conf
?
(I have the history of all the setup commands I have run on the cluster as I was recording them as notes… but I'm not comfortable making those notes public)
LKE using Kubernetes 1.17
4 Replies
Hello stephenc,
Thank you for reaching out to us about this and for providing the output. I think the best course of action here is to escalate this to members of our LKE team. I went ahead and brought your query to their attention. You'll receive a follow-up response once we've determined the root cause.
We'll be in touch asap.
Kind Regards,
Christopher
Hi @stephenc!
Can I confirm that you are using the External DNS controller (https://github.com/kubernetes-sigs/external-dns) configured to use Linode DNS?
Currently, I see no configuration issues with CoreDNS
/ # dig duckduckgo.com
; <<>> DiG 9.14.8 <<>> duckduckgo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22397
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;duckduckgo.com. IN A
;; ANSWER SECTION:
duckduckgo.com. 30 IN A 54.241.2.241
;; Query time: 1 msec
;; SERVER: 10.128.0.10#53(10.128.0.10)
;; WHEN: Thu Mar 05 18:36:44 UTC 2020
;; MSG SIZE rcvd: 73
The above DNS resolution was carried out inside an Alpine container running on an LKE cluster.
You can reproduce this with
$ kubectl run alpine-foo -ti --image=alpine --restart=Never /bin/sh
# apk update
# apk add bind-tools
# dig duckduckgo.com
Note that the DNS server used is CoreDNS, 10.128.0.10
I ran into a similar situation with cert-manager (the configuration that comes default with the gitlab helm docs) - logs indicated the pod couldn't resolve the name I'd configured external-dns to create in Linode DNS. The record was indeed created, I could resolve from my connection, but the cluster couldn't (waited 1 hour). Replaced the CoreDNS forward
directive as OP, the certificates almost immediately verified.
Also k8s v1.17, coredns 1.6.5
This thread saved me a bunch of time. Thanks OP! o/
p.s. I've got a full helm-ified traefik 2 ingress using ingressRoute CRDs + letsencrypt configuration I'd be happy to share. Nice little ghetto way to manage everything through minimum load balancers. I figure this isn't the place to post that, hit me up if you're interested.
If somebody else stumbles into this: CoreDNS is configured to be use Linode nameservers in its resolv.conf
.
The default TTL for those is 24 hours.
You can specify a shorter TTL for the external-dns records by adding the external-dns.alpha.kubernetes.io/ttl
annotation to your services / ingress.
See https://github.com/kubernetes-sigs/external-dns/blob/master/docs/ttl.md
That was enough for me.