Hello,
I’m trying to set up Charmed Kubernetes for the first time and I’ve run into some problems which might be bugs, limitations of the charms or misunderstanding on my part, or possibly a bit of each.
Let me give a bit of context first: Presently I’m running 2 node Kubernetes cluster using microk8s. It has 48 cores and 224 RAM in total. I’m using MetalLB for accessing the services from LAN, Emissary Ingress with Cert Manager for HTTP/S traffic dispatch and Rook/Ceph on top of TopoLVM for distributed storage. I’m running Jenkins, Sontatype Nexus some databases and other tools and a bunch of test environments for my small software development team on this cluster. It works, but, we’re starting to run out of resources. The other issue is that when on of the machines runs out of RAM and starts OOMKilling processes the whole thing crashes. Running Ceph on k8s and mounting rbd devices in the kernel of the machine that is running k8s has the downside that when Ceph processes are stopped everything else gets wedged and power cycling the servers is the only way forward. Too bad that on of the servers in this makeshift cluster doesn’t even have a BMC and somebody in the office needs to walk to the broom closet and press the reset button
I’ve decided to build a bigger and more robust cluster to to replace the one we have. For this purpose, I’ve bought refurbished Dell PowerEdge servers: one R330 (4 vCore @ 3GHz, 32G RAM, 500G SSD) and three R630s (64 vCore @ 2.6HGz, 256G RAM, 2T NVMe). So far, I’ve successfully installed MAAS and Juju controllers in LXD containers on the R330 and figured out how to configure the R630s via MAAS with the desired storage layout and network setup. Juju is able to claim/control/release the machines without problems.
Next steps that I intend to take is deploying Charmed Kubernetes and Ceph with Juju onto 3 machine cluster provided by MAAS. I would like to make this cluster HA in the sense that it should tolerate taking down any 1 of the 3 machines for maintenance for a limited period of time.
I’ve started out with juju deploy charmed-kubernetes
without any overlays. The first surprise was that the charm requested 10 machines from MAAS. My plan is to run everything (control plane, storage and workloads) on the 3 large servers. I might add worker nodes in the future but for now, I need to make do with the hardware I have. Adding a physical server just to run an etcd or nginx seems wasteful and besides, I can turn only so much electricity into heat inside a broom closet, before the damn thing turns into a fire hazard I quickly learned how to define my own bundle that declares a set of machines and places application units on them explicitly.
At this point I’ve realized that that it doesn’t seem to be possible to deploy kubeapi-load-balancer
on the same machines as kubernetes-control-plane
or kubernetes-worker
units. This is because of TCP port conflict. My understanding is kubernetes-control-plane
exposes k8s API server on port 6443 and kubeapi-load-balancer
also listens on the same port, forwarding traffic to control plane units. kuberntes-worker
on the other hand exposes default HTTP/S ingress on ports 80 and 443 and kubeapi-load-balancer
also listens on port 443 and, as far as I understand, it forwards the traffic to kubernetes-worker
nodes port 443. It does not listen on port 80 though which seems inconsistent.
Then, I’ve found out that I can set ingress=false
config option on kubernetes-worker
to make port 443 available to kubeapi-load-balancer
. I’m fine with this, because I intend to use MetalLB + Emissary Ingress as before for handling HTTP/S anyway.
The next bundle configuration I’ve tried was kubeapi-load-balancer
on machine 0, kubernetes-control-plane
on machines 0 and 1 and kubernetes-worker
on machines 0, 1 and 2. Obviously, this is not a HA configuration I’m after. To make this HA I’d need more control over kubeapi-load-balancer
configuration: If I could configure it to listen at a different port, say, 8443 and forward it to kubernetes-control-plane
port 6443 I could deploy both control plane and load balancer units to all three machines. Then, I could use keepalived
as described in HA for kubeapi-load-balancer | Ubuntu to float a virtual IP of the load balancer among the three machines which seems perfectly adequate for my setup.
So this is the first batch of questions that I would like to ask: Is it possible to configure kubeapi-load-balancer
to use different listen and downstream ports? Also, is it possible to disable forwarding port 443 to kubernetes-worker , which I don’t need, while we are at it? I understand I would also have to override the the port in public API server URL that all the components are using. Can I do that?
In the meantime I’ve decided to proceed with 1 load balancer unit + 2 control plane units + 3 worker units setup to experiment with Ceph storage and other cluster add-ons. However, the cluster doesn’t seem to start up correctly. Here’s what the state of the model has settled to:
rafal@stagnum0:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
k8s-on-maas juju-3-3 maas/default 3.3.0 unsupported 13:39:09+01:00
App Version Status Scale Charm Channel Rev Exposed Message
calico waiting 5 calico 1.28/stable 101 no Waiting for Kubernetes config.
containerd 1.6.8 active 5 containerd 1.28/stable 73 no Container runtime available
easyrsa 3.0.1 active 1 easyrsa 1.28/stable 48 no Certificate Authority connected.
etcd 3.4.22 active 3 etcd 1.28/stable 748 no Healthy with 3 known peers
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer 1.28/stable 84 yes Loadbalancer ready.
kubernetes-control-plane 1.28.5 waiting 2 kubernetes-control-plane 1.28/stable 321 no Waiting for auth-webhook tokens
kubernetes-worker 1.28.5 waiting 3 kubernetes-worker 1.28/stable 134 yes Waiting for CNI plugins to become available
Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 192.168.3.30 Certificate Authority connected.
etcd/0 active idle 0 192.168.3.30 2379/tcp Healthy with 3 known peers
etcd/1* active idle 1 192.168.3.31 2379/tcp Healthy with 3 known peers
etcd/2 active idle 2 192.168.3.32 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 0 192.168.3.30 443,6443/tcp Loadbalancer ready.
kubernetes-control-plane/0* waiting idle 1 192.168.3.31 6443/tcp Waiting for auth-webhook tokens
calico/2 active idle 192.168.3.31 Ready
containerd/2 active idle 192.168.3.31 Container runtime available
kubernetes-control-plane/1 waiting idle 2 192.168.3.32 6443/tcp Waiting for 3 kube-system pods to start
calico/3 active idle 192.168.3.32 Ready
containerd/3 active idle 192.168.3.32 Container runtime available
kubernetes-worker/0 active idle 1 192.168.3.31 Kubernetes worker running.
calico/0* active idle 192.168.3.31 Ready
containerd/0* active idle 192.168.3.31 Container runtime available
kubernetes-worker/1 waiting idle 2 192.168.3.32 Waiting for CNI plugins to become available
calico/1 waiting idle 192.168.3.32 Waiting for Kubernetes config.
containerd/1 active idle 192.168.3.32 Container runtime available
kubernetes-worker/2* waiting idle 0 192.168.3.30 Waiting for CNI plugins to become available
calico/4 waiting idle 192.168.3.30 Waiting for Kubernetes config.
containerd/4 active idle 192.168.3.30 Container runtime available
Machine State Address Inst id Base AZ Message
0 started 192.168.3.30 stagnum1 ubuntu@22.04 default Deployed
1 started 192.168.3.31 stagnum2 ubuntu@22.04 default Deployed
2 started 192.168.3.32 stagnum3 ubuntu@22.04 default Deployed
I was able to connect to the k8s cluster and I can see two nodes:
rafal@stagnum0:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
stagnum2 Ready control-plane 15h v1.28.5
stagnum3 Ready control-plane 15h v1.28.5
However the system workloads don’t look healthy:
rafal@stagnum0:~$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 15h
rafal@stagnum0:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6f698d4f87-z79sz 0/1 CrashLoopBackOff 62 (3m46s ago) 15h
kube-system calico-node-d762p 0/1 Running 4 (165m ago) 15h
kube-system calico-node-tb9sk 0/1 Unknown 1 15h
kube-system coredns-59cfb5bf46-p7rz2 0/1 Pending 0 15h
kube-system kube-state-metrics-78c475f58b-pdrtt 0/1 Pending 0 15h
kube-system metrics-server-v0.6.3-69d7fbfdf8-dpcrp 0/2 Pending 0 15h
kubernetes-dashboard dashboard-metrics-scraper-5dd7cb5fc-wggdv 0/1 Pending 0 15h
kubernetes-dashboard kubernetes-dashboard-7b899cb9d9-sn5sn 0/1 Pending 0 15h
rafal@stagnum0:~$ kubectl get events -n kube-system
LAST SEEN TYPE REASON OBJECT MESSAGE
10m Warning BackOff pod/calico-kube-controllers-6f698d4f87-z79sz Back-off restarting failed container calico-kube-controllers in pod calico-kube-controllers-6f698d4f87-z79sz_kube-system(5339f618-0053-45a9-980d-ce9bd80a42b3)
47s Warning Unhealthy pod/calico-node-d762p (combined from similar events): Readiness probe failed: 2023-12-26 12:45:03.955 [INFO][55529] confd/health.go 180: Number of node(s) with BGP peering established = 0...
15s Warning BackOff pod/calico-node-tb9sk Back-off restarting failed container install-cni in pod calico-node-tb9sk_kube-system(dbe2ac2b-52b2-4c36-a82e-61f3f3ee9fd1)
19s Warning FailedScheduling pod/coredns-59cfb5bf46-p7rz2 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
19s Warning FailedScheduling pod/kube-state-metrics-78c475f58b-pdrtt 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
19s Warning FailedScheduling pod/metrics-server-v0.6.3-69d7fbfdf8-dpcrp 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
rafal@stagnum0:~$ kubectl logs calico-kube-controllers-6f698d4f87-z79sz -n kube-system --tail=5
2023-12-26 12:46:57.057 [INFO][1] hostendpoints.go 173: successfully synced all hostendpoints
2023-12-26 12:47:01.047 [ERROR][1] main.go 297: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz": context deadline exceeded status=0
2023-12-26 12:47:01.047 [INFO][1] main.go 313: Health check is not ready, retrying in 2 seconds with new timeout: 8s
2023-12-26 12:47:11.050 [ERROR][1] main.go 297: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz": context deadline exceeded status=0
2023-12-26 12:47:11.050 [INFO][1] main.go 313: Health check is not ready, retrying in 2 seconds with new timeout: 16s
I’ve expected to see three kubernetes nodes provided by kubernetes-worker
units, meanwhile I see two unschedulable nodes that appear to represent kubernetes-control-plane
units, and they also don’t appear to work right. I was trying to poke around the debug logs and found something that’s probably relevant:
rafal@stagnum0:~$ juju debug-log --include kubernetes-control-plane/0 --no-tail -n 3
unit-kubernetes-control-plane-0: 16:48:48 WARNING unit.unit-kubernetes-control-plane-0.collect-metrics E1226 15:48:48.191384 794230 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 192.168.3.31, not 127.0.0.1
unit-kubernetes-control-plane-0: 16:48:48 WARNING unit.unit-kubernetes-control-plane-0.collect-metrics E1226 15:48:48.198407 794230 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 192.168.3.31, not 127.0.0.1
unit-kubernetes-control-plane-0: 16:48:48 WARNING unit.unit-kubernetes-control-plane-0.collect-metrics Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 192.168.3.31, not 127.0.0.1
I’ve logged into machines 1 and 2 and found that several kubeconfig files in /root/cdk
files do indeed specify API server location as https://127.0.0.1:6443. I think I might have run into a bug because I tried following https://ubuntu.com/kubernetes/docs/install-manual closely and the only things I’ve tweaked was number and location of units and disabling of default ingress, so I’d expect to end up with a functional cluster. If there’s a possible workaround for that please let me know.
If there’s anything I can do help diagnosing this issue, please tell me what I can do. I see benefits of managing my k8s cluster with juju and I am willing to put in the work.
Cheers, Rafał