All of my pods can’t communicate with any HTTPs, my cert-manager gets cert-manager/controller/clusterissuers "msg"="error setting up issuer" "error"="Get https://acme-staging-v02.api.letsencrypt.org/directory: x509: certificate is valid for ingress.local, not acme-staging-v02.api.letsencrypt.org" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" and my other pods are also reporting similar issue like https://github.com/...: x509: certificate is valid for ingress.local
Copying some of the backlog from the #juju channel on freenode. We need to figure out how to ask Juju to tell the k8s apiserver that its configuration settings should be changed.
[14:08:19]<kelvinliu_> it's like you need customise some options for k8s api-server,
[14:10:38]<atdprhs> kelvinliu_I am already checking with them but no answer, but as far as I know, conjure-up is using juju
[14:10:57]<atdprhs> so my best guess on such issue, it needs juju involvement
[14:11:41]<kelvinliu_> it's more like you need config the deployment.
[14:15:47]<kelvinliu_> i m not sure if u can find the config option from here, https://jaas.ai/u/containers/kubernetes-master
[14:17:06]<kelvinliu_> juju config kubernetes-master apiserver-cert-extra-xxxx=xxxxx
[14:17:28]<kelvinliu_> u just need to set the config like this
[14:20:33]<atdprhs> Could this help with the DNS issue?
[14:22:53]<kelvinliu_> from the link u give me, they fix it by customising the api-server option.
[14:24:23]<atdprhs> yes, I see `kubeadm init --apiserver-cert-extra-sans="mydomainhere.com" --pod-network-cidr="10.244.0.0/16" --service-cidr="10.96.0.0/12" --apiserver-advertise-address="0.0.0.0"`
[14:24:45]<atdprhs> I don't know how or to what I configure `--pod-network-cidr="10.244.0.0/16" --service-cidr="10.96.0.0/12"`
[14:25:03]<kelvinliu_> so it's not an issue with juju at all,
[14:25:56]<atdprhs> On kubernetes chat, I have received a response from one of the guys there `I used conjure-up to deploy my k8s and use cert manager. What is wrong that you're trying to fix here? Do you have the same issue as the bug? Do you know what is actually happening to get an odd cert like that? It looks like the solution was just to change or define network
[14:25:56]<atdprhs> stuff and extra sans. You can do all that with juju, but shouldn't have to do it.`
[14:26:04]<atdprhs> This guy is currently offline
[14:26:09]<kelvinliu_> as i just said, u will need find the relevant options in the doc of kubernetes master then run the cmd above to config it
[14:26:18]<atdprhs> but based on him, it look like it's all juju
[14:27:21]<atdprhs> From the document you sent `DNS for the cluster` might help I guess as I know it's DNS issue, cuz all of my pods can't communicate with any HTTPs, my cert-manager gets `cert-manager/controller/clusterissuers "msg"="error setting up issuer" "error"="Get https://acme-staging-v02.api.letsencrypt.org/directory: x509: certificate is valid for ingress.lo
[14:27:22]<atdprhs> cal, not acme-staging-v02.api.letsencrypt.org" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"=""` and my other pods are also reporting similar issue like `https://github.com/...: x509: certificate is valid for ingress.local`
[14:27:46]* veebers has quit (Quit: veebers)
[14:31:35]* veebers (~veebers@210.54.38.249) has joined
[14:33:10]<kelvinliu_> sorry, im not an expert of k8s api-server, it's better to wait him online or ask others in k8s channel.>
Apologies for the delayed response; I was travelling.
From the error you’re getting, it does look like the only thing you hopefully should need should would be the extra_sans config option on the kubernetes-master charm. If you do end up needing to tweak the other options manually, then there is the service-cidr option on kubernetes-master and, while there is not an explicit option for the --pod-network-cidr param, you can provide that to kubernetes-worker via the kubelet-extra-args config option.
All of these options can be configured during deployment with conjure-up by clicking the Configure button next to the relevant charm on the Configure Applications screen. The extra_sans option can be changed after deployment with juju config (which is what conjure-up drives in the background and provides an interactive walk-through experience for), but the other two options must be set at deployment time, so would need to either be set via conjure-up or, if deploying manually, provided as a bundle overlay when calling juju deploy.
I would point out that the issue seems to be networking. Specifically, DNS is resolving to intra-cluster addresses for that error. I wonder if there is something like a service with the name of the external address as the “bug” was fixed by removing the ndots search in the host file. I would start up an ubuntu pod and poke around with name resolution and see what is going on.
kubectl run test -it --rm --image ubuntu -- bash
Then inside there, do things like dig acme-v01.api.letsencrypt.org and curl acme-v01.api.letsencrypt.org and see what you can learn.
$ kubectl run test -it --rm --image ubuntu -- bash
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
root@test-654cdfc5d5-4tw8k:/# dig acme-v01.api.letsencrypt.org and curl acme-v01.api.letsencrypt.org
bash: dig: command not found
root@test-654cdfc5d5-4tw8k:/# dig google.com @8.8.8.8
bash: dig: command not found
root@test-654cdfc5d5-4tw8k:/#
We spoke some on IRC about this and I suggested you to install the dig utility from the apt package dnsutils. This resulted in:
# apt install dnsutils
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package dnsutils
root@test-654cdfc5d5-4tw8k:/# apt update
Ign:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Ign:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Err:4 http://security.ubuntu.com/ubuntu bionic-security Release
404 Not Found [IP: <my.ip> 80]
Ign:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Err:6 http://archive.ubuntu.com/ubuntu bionic Release
404 Not Found [IP: <my.ip> 80]
Err:7 http://archive.ubuntu.com/ubuntu bionic-updates Release
404 Not Found [IP: <my.ip> 80]
Err:8 http://archive.ubuntu.com/ubuntu bionic-backports Release
404 Not Found [IP: <my.ip> 80]
Reading package lists... Done
E: The repository 'http://security.ubuntu.com/ubuntu bionic-security Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-updates Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-backports Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
This indicates to me a general DNS issue inside the pod as you reported. DNS should hit the coredns pod, which defers to your dns server for anything outside the cluster. I find it interesting that kubelet could download the ubuntu image, but then internally you can’t resolve DNS.
From here, I would be interested in looking at the running pod status and the logs of the coredns pods.
kubectl get po -A
and then for each coredns pod: kubectl logs -n kube-system po/coredns-<hash>
Thanks a lot @knoppy, you really made my day!!! As discovered, it was the search that’s being set by maas that was the root cause of why the DNS was failing… I’ll create another post on maas and mention you in it
For anyone following along at home, there was a default search that included a personal domain name and the dns server was set to resolve *.domain.com to the an address. This domain was added to MaaS and as a result the resolv.conf had “search domain.com” in it. This in turn meant a dig www.google.com resulted in a resolve request for www.google.com.domain.com, which was happily resolved to that wildcard address.
To confuse matters some more, there were 3 upstream DNS servers, so depending on which one you asked you either got the proper IP or the wildcard IP.