Charmed Kubernetes hanging at: Waiting for 3 kube-system pods to start

dsmxt · 7 January 2021 21:38

I followed this guide https://ubuntu.com/kubernetes/docs/install-local to install Charmed Kubernetes via Juju on my server.

After many trials and errors I finally have the installation where everything is running and active, except the Kubernetes-Master is stuck in a waiting state: Waiting for 3 kube-system pods to start

I retrieved the logs from the Kubernetes-Master with: juju ssh kubernetes-master/0 and
sudo pastebinit /var/log/juju/unit-kubernetes-master-0.log
The result can be found here: https://paste.ubuntu.com/p/qgvQwzZTrP/

juju status output:

Model  Controller           Cloud/Region         Version  SLA          Timestamp
k8s    localhost-localhost  localhost/localhost  2.8.7    unsupported  22:35:04+01:00

App                    Version  Status   Scale  Charm                  Store       Rev  OS      Notes
containerd             1.3.3    active       5  containerd             jujucharms  100  ubuntu  
easyrsa                3.0.1    active       1  easyrsa                jujucharms  342  ubuntu  
etcd                   3.4.5    active       3  etcd                   jujucharms  546  ubuntu  
flannel                0.11.0   active       5  flannel                jujucharms  514  ubuntu  
kubeapi-load-balancer  1.18.0   active       1  kubeapi-load-balancer  jujucharms  754  ubuntu  exposed
kubernetes-master      1.20.1   waiting      2  kubernetes-master      jujucharms  926  ubuntu  
kubernetes-worker      1.20.1   active       3  kubernetes-worker      jujucharms  718  ubuntu  exposed

Unit                      Workload  Agent  Machine  Public address  Ports           Message
easyrsa/0*                active    idle   0        10.100.99.131                   Certificate Authority connected.
etcd/0*                   active    idle   1        10.100.99.120   2379/tcp        Healthy with 3 known peers
etcd/1                    active    idle   2        10.100.99.182   2379/tcp        Healthy with 3 known peers
etcd/2                    active    idle   3        10.100.99.121   2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle   4        10.100.99.136   443/tcp         Loadbalancer ready.
kubernetes-master/0*      waiting   idle   5        10.100.99.199   6443/tcp        Waiting for 3 kube-system pods to start
  containerd/4            active    idle            10.100.99.199                   Container runtime available
  flannel/4               active    idle            10.100.99.199                   Flannel subnet 10.1.25.1/24
kubernetes-master/1       waiting   idle   6        10.100.99.45    6443/tcp        Waiting for 3 kube-system pods to start
  containerd/3            active    idle            10.100.99.45                    Container runtime available
  flannel/3               active    idle            10.100.99.45                    Flannel subnet 10.1.70.1/24
kubernetes-worker/0       active    idle   7        10.100.99.141   80/tcp,443/tcp  Kubernetes worker running.
  containerd/1            active    idle            10.100.99.141                   Container runtime available
  flannel/1               active    idle            10.100.99.141                   Flannel subnet 10.1.53.1/24
kubernetes-worker/1       active    idle   8        10.100.99.183   80/tcp,443/tcp  Kubernetes worker running.
  containerd/2            active    idle            10.100.99.183                   Container runtime available
  flannel/2               active    idle            10.100.99.183                   Flannel subnet 10.1.77.1/24
kubernetes-worker/2*      active    idle   9        10.100.99.6     80/tcp,443/tcp  Kubernetes worker running.
  containerd/0*           active    idle            10.100.99.6                     Container runtime available
  flannel/0*              active    idle            10.100.99.6                     Flannel subnet 10.1.12.1/24

Machine  State    DNS            Inst id        Series  AZ  Message
0        started  10.100.99.131  juju-ddaa0b-0  focal       Running
1        started  10.100.99.120  juju-ddaa0b-1  focal       Running
2        started  10.100.99.182  juju-ddaa0b-2  focal       Running
3        started  10.100.99.121  juju-ddaa0b-3  focal       Running
4        started  10.100.99.136  juju-ddaa0b-4  focal       Running
5        started  10.100.99.199  juju-ddaa0b-5  focal       Running
6        started  10.100.99.45   juju-ddaa0b-6  focal       Running
7        started  10.100.99.141  juju-ddaa0b-7  focal       Running
8        started  10.100.99.183  juju-ddaa0b-8  focal       Running
9        started  10.100.99.6    juju-ddaa0b-9  focal       Running

jameinel · 8 January 2021 14:57

I don’t know where kubernetes-master knows that it wants to have 3 masters, though from your status output it is clear that you only have “scale 2”. I’m guessing kubernetes-master is aware that it wants to be HA and is complaining that it cannot reliably go HA with only 2 nodes.
You probably want to do something like juju add-unit kubernetes-master. If you do not have another machine available you could colocate it with something like juju add-unit kubernetes-master --to X though I would avoid colocating it on a machine that is already running one of the master units.

dsmxt · 8 January 2021 17:35

I added a new master with juju add-unit kubernetes-master

Unfortunately this didn’t change the outcome. Now I have a third master in the same state as the other two masters. Here is the log from the new master: https://paste.ubuntu.com/p/yW2Q4HCKf2/ and juju status:

Model Controller Cloud/Region Version SLA Timestamp k8s localhost-localhost localhost/localhost 2.8.7 unsupported 18:34:42+01:00

App Version Status Scale Charm Store Rev OS Notes containerd 1.3.3 active 6 containerd jujucharms 100 ubuntu easyrsa 3.0.1 active 1 easyrsa jujucharms 342 ubuntu etcd 3.4.5 active 3 etcd jujucharms 546 ubuntu flannel 0.11.0 active 6 flannel jujucharms 514 ubuntu kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer jujucharms 754 ubuntu exposed kubernetes-master 1.20.1 waiting 3 kubernetes-master jujucharms 926 ubuntu kubernetes-worker 1.20.1 active 3 kubernetes-worker jujucharms 718 ubuntu exposed

Unit easyrsa/0* etcd/0* etcd/1 etcd/2 kubeapi-load-balancer/0* active kubernetes-master/0* containerd/4 active idle flannel/4 active idle kubernetes-master/1 containerd/3 active idle flannel/3 active idle kubernetes-master/2 containerd/5 active idle flannel/5 active idle kubernetes-worker/0 containerd/1 active idle flannel/1 active idle kubernetes-worker/1 containerd/2 active idle flannel/2 active idle kubernetes-worker/2* containerd/0* active idle flannel/0* active idle Workload Agent Machine Public address Ports Message active idle 0 10.100.99.131 Certificate Authority connected. active idle 1 10.100.99.120 2379/tcp Healthy with 3 known peers active idle 2 10.100.99.182 2379/tcp Healthy with 3 known peers active idle 3 10.100.99.121 2379/tcp Healthy with 3 known peers idle 4 10.100.99.136 443/tcp Loadbalancer ready. waiting idle 5 10.100.99.199 6443/tcp Waiting for 3 kube-system pods to start 10.100.99.199 Container runtime available 10.100.99.199 Flannel subnet 10.1.25.1/24 waiting idle 6 10.100.99.45 6443/tcp Waiting for 3 kube-system pods to start 10.100.99.45 Container runtime available 10.100.99.45 Flannel subnet 10.1.70.1/24 waiting idle 10 10.100.99.16 6443/tcp Waiting for 3 kube-system pods to start 10.100.99.16 Container runtime available 10.100.99.16 Flannel subnet 10.1.14.1/24 active idle 7 10.100.99.141 80/tcp,443/tcp Kubernetes worker running. 10.100.99.141 Container runtime available 10.100.99.141 Flannel subnet 10.1.53.1/24 active idle 8 10.100.99.183 80/tcp,443/tcp Kubernetes worker running. 10.100.99.183 Container runtime available 10.100.99.183 Flannel subnet 10.1.77.1/24 active idle 9 10.100.99.6 80/tcp,443/tcp Kubernetes worker running. 10.100.99.6 Container runtime available 10.100.99.6 Flannel subnet 10.1.12.1/24

Machine State DNS Inst id Series AZ Message 0 started 10.100.99.131 juju-ddaa0b-0 focal Running 1 started 10.100.99.120 juju-ddaa0b-1 focal Running 2 started 10.100.99.182 juju-ddaa0b-2 focal Running 3 started 10.100.99.121 juju-ddaa0b-3 focal Running 4 started 10.100.99.136 juju-ddaa0b-4 focal Running 5 started 10.100.99.199 juju-ddaa0b-5 focal Running 6 started 10.100.99.45 juju-ddaa0b-6 focal Running 7 started 10.100.99.141 juju-ddaa0b-7 focal Running 8 started 10.100.99.183 juju-ddaa0b-8 focal Running 9 started 10.100.99.6 juju-ddaa0b-9 focal Running 10 started 10.100.99.16 juju-ddaa0b-10 focal Running

Moula · 20 October 2021 09:11

I am trying to deploy it with the edge bundle, but the problem is still there !!! It is still waiting for “Waiting for 3 kube-system pods to start”. But if we add a 3, it waits for the 4th !!!

mmrezaie · 2 November 2021 19:50

I do have the same problem. Is there no solution for this?

Moula · 3 November 2021 06:01

Sorry, I still have the same problem.

yanxiaomu · 16 November 2021 14:42

waiting.Maybe it need some time to delply pods.

Moula · 16 November 2021 17:18

How much an hour, two hours … No, Really there is a problem with the masters nodes. the Workers even with the GPU deploy correctly. Thank’s.

yanxiaomu · 17 November 2021 02:10

Charmed kubernetes was deployed in 5:30.

k8-master node in “Waiting for 7 kube-system pods to start” was about 9:00.

When I want to give up,it changed to “Waiting for 5 kube-system pods to start”.So I think it is doing someting…

k8s-master node in " Kubernetes master running." was s about 11:30.

Moula · 8 December 2021 11:12

I still have the same error message despite having changed the version and increasing the capacities of the cluster. Everything is working even Workers with GPUs but Masters are still waiting. But what I don’t know

yanxiaomu · 9 December 2021 02:40

I deploy k8s in two kinds infrastructures,one is lxd in MAAS,one is deployed openstack-base-78 with Juju & MAAS.

Another difference is that I used flannel network .

Charmed kubernetes #814 and Charmed Kubernetes #679 were tested.