[Solved-Partially] Kubernetes HTTP01 Challenge Seperate Namespace
Hey folks, I am trying to get SSL working on my Kubernetes cluster.
*ingress controller is deployed on namespace default
*application is installed in namespace app01
*ingress object is deployed to namespace app01
*confirmed without tls related and cert manager http traffic works
relevant ingress portions
metadata:
...
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
cert-manager.ioacme-challenge-type: http01
spec:
tls:
- hosts:
- redact.com
- redact.com
secretName: redact-tls
However, when I do a describe challenge in the app01 namespace, I see they are all failing to perform HTTP-01 challenge propogation.
Thinking it's maybe similar to what I am seeing here: https://www.digitalocean.com/community/questions/how-do-i-correct-a-connection-timed-out-error-during-http-01-challenge-propagation-with-cert-manager
TLDR:
controller on default namespace
application on app01 namespace
ingress object with TLS info deployed to app01 namespace
http-01 challenge failing
3 Replies
Hey @serviceme,
I did some testing of this myself, but I was unable to recreate the connection timeout issues you're seeing. I configured my environment like this:
- My Nginx Ingress Controller was deployed in the default namespace
- I deployed the Nginx demo application included in the guide you mentioned to the app01 Namespace.
- I deployed the Ingress to the app01 Namespace.
I don't believe the Digital Ocean post is related, as there's no need to set a hostname on an LKE Cluster's NodeBalancer. If possible, it might be helpful to post some of the errors from your cert-manager logs to see if we can get a better idea of the root cause.
Regards,
Ryan L.
Linode Support Staff
EDIT
Solved, partially (I don't have a clear answer as how to fix on the original approach).
So, root cause here was I was using Nginx's Ingress controller versus the Kubernetes maintained Nginx controller.
Kub's maintained version creates a default service that routes HTTP traffic appropriately to the challenges that are deployed. Nginx's version does not.
If I deploy out another cluster at some point, I'll look into this more, but I think what's happening is with Nginx controller, default backend is essentially HTTPS and there is something odd happening with cert manager causing the TLS handshake to fail on the get so certs are never generated.
I think something similar to what is being done on the ingress here would need to be done https://medium.com/containerum/how-to-launch-nginx-ingress-and-cert-manager-in-kubernetes-55b182a80c8f to handle the port 80 call.
Thanks @rl0nergan for the reply. Hopefully I am not doing anything stupid here (which I wouldn't put past me ;)). Let me know if there are other logs I can provide that may be helpful.
As a note, my domains are managed via Namecheap and not imported to the DNS manager on Linode (I am assuming this isn't an issue here). My A records are pointing the Node Balancer external IP. Additional note I am using latest cert bot referenced here https://cert-manager.io/docs/installation/kubernetes/ versus the 0.15 referenced on the Linode article.
From my understanding of the error, the challenge is issued via the pods and external traffic can't hit the temporary pod (so we fail and don't proceed to hit Letsencrypt's servers) [https://cert-manager.io/docs/faq/acme/]
kubectl get pods -n app01
NAME READY STATUS RESTARTS AGE
app01core 1/1 Running 0 21h
cm-acme-http-solver-29sqz 1/1 Running 0 18h
cm-acme-http-solver-cgds7 1/1 Running 0 18h
cm-acme-http-solver-lt7q7 1/1 Running 0 18h
cm-acme-http-solver-zqd86 1/1 Running 0 18h
kubectl describe ingress -n app01
I noticed I have no nginx-ingress-default-backend on my default svc's. Installed nginx controller via https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/ versus the Kubernetes maintained Nginx Ingress (helm command was helm install nginx-ingress stable/nginx-ingress --set controller.publishService.enabled=true)… I am exploring this as I think this might be part of the root cause -- I saw some other website mentioning handshake issues for another problem that was caused by default backend coming in HTTPS or something like that.
...
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
...
kubectl describe challenges -n app01
...
Reason: Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://redact/.well-known/acme-challenge/redact': Get "https://redact:443/.well-known/acme-challenge/redact": remote error: tls: handshake failure
State: pending
...
kubectl logs cert-manager-5bc6c5cb94-22hfb -n cert-manager
E0909 01:12:54.634692 1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://www.redact.com/.well-known/acme-challenge/redact': Get \"https://www.redact.com:443/.well-known/acme-challenge/redact\": remote error: tls: handshake failure" "dnsName"="www.redact.com" "resource_kind"="Challenge" "resource_name"="app01-tls-7l84x-1528030512-2246324353" "resource_namespace"="app01" "resource_version"="v1" "type"="HTTP-01"
ingress object
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: app01
namespace: app01
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
cert-manager.io/acme-challenge-type: http01
spec:
tls:
- hosts:
- redact.video
- www.redact.video
- redact.com
- www.redact.com
secretName: redact-tls
rules:
- host: redact.video
http:
paths:
- backend:
serviceName: app01-core
servicePort: 8000
- host: www.redact.video
http:
paths:
- backend:
serviceName: app01-core
servicePort: 8000
- host: redact.com
http:
paths:
- backend:
serviceName: app01-core
servicePort: 8000
- host: www.redact.com
http:
paths:
- backend:
serviceName: app01-core
servicePort: 8000
clusterissuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: redact@gmail.com
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-secret-prod
solvers:
- http01:
ingress:
class: nginx
and the actual pod being deployed to app01 (a containerized flask app with gunicorn)
apiVersion: v1
kind: Pod
metadata:
name: app01core
namespace: app01
labels:
app: app01core
spec:
containers:
- name: main-app-container
image: redact.azurecr.io/redact/core_app:latest
imagePullPolicy: IfNotPresent
env:
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: environment
key: SECRET_KEY
- name: RECAPTCHA_PUB
valueFrom:
secretKeyRef:
name: environment
key: RECAPTCHA_PUB
- name: RECAPTCHA_PRV
valueFrom:
secretKeyRef:
name: environment
key: RECAPTCHA_PRV
- name: SENDGRID_KEY
valueFrom:
secretKeyRef:
name: environment
key: SENDGRID_KEY
- name: SENDGRID_SENDER
valueFrom:
secretKeyRef:
name: environment
key: SENDGRID_SENDER
ports:
- containerPort: 8000
imagePullSecrets:
- name: acr-secret