Unit waiting to bootstrap

Hi I’m getting this message to some containers that are running. I’ve rebooted the containers and they reboot fine with mysql (percona) running on them. The waiting to bootstrap message is misleading; how can I fix this?

juju version: 2.8.8-bionic-amd64

cadmin@conductor:~$ juju status | ag mysql
mysql                  5.7.20-29.24   waiting      3  percona-cluster        jujucharms  269  ubuntu
mysql/0*                   active    idle   0/lxd/7   10.2.2.51       3306/tcp           Unit is ready
mysql/1                    waiting   idle   15/lxd/1  10.2.2.217      3306/tcp           Unit waiting to bootstrap
mysql/2                    waiting   idle   16/lxd/1  10.2.2.218      3306/tcp           Unit waiting to bootstrap

Hi @lemonstar. Thank you for the question!

If the containers think that they’re still “waiting to bootstrap”, it means that they haven’t checked in to tell the controller that they’re up and running for some reason.

Please take a look at the juju agent logs on those containers (in /var/log/juju) and let us know if you see any interesting errors. I suspect that either the controller ip has changed, and the agents can’t find it as a result, or there is some other issue on the machine that is knocking the juju agent over before it can tell the controller that it is up and running and has done the work that the controller expected it to do.

1 Like

Thank you for the response. I just started another job and I lost this. Here’s something from the logs

2021-02-18 17:42:53 ERROR juju.worker.dependency engine.go:587 "storage-provisioner" manifold worker returned unexpected error: watching block devices: getting environ: Get http://10.2.2.1:5240/MAAS/api/2.0/version/: dial tcp 10.2.2.1:5240: connect: connection refused
2021-02-18 17:42:54 WARNING juju.cmd.jujud machine.go:831 determining kvm support: INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_intel
modprobe: FATAL: Module msr not found in directory /lib/modules/4.15.0-132-generic
: exit status 1
no kvm containers possible
2021-02-18 17:42:54 ERROR juju.worker.dependency engine.go:587 "unconverted-api-workers" manifold worker returned unexpected error: setting up container support: cannot load machine machine-15-lxd-1 from state: Get http://10.2.2.1:5240/MAAS/api/2.0/version/: dial tcp 10.2.2.1:5240: connect: connection refused
2021-02-18 17:42:56 ERROR juju.worker.dependency engine.go:587 "storage-provisioner" manifold worker returned unexpected error: watching block devices: getting environ: Get http://10.2.2.1:5240/MAAS/api/2.0/version/: dial tcp 10.2.2.1:5240: connect: connection refused
2021-02-18 17:42:57 WARNING juju.cmd.jujud machine.go:831 determining kvm support: INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_intel
modprobe: FATAL: Module msr not found in directory /lib/modules/4.15.0-132-generic
: exit status 1
no kvm containers possible
2021-02-18 17:42:57 ERROR juju.worker.dependency engine.go:587 "unconverted-api-workers" manifold worker returned unexpected error: setting up container support: cannot load machine machine-15-lxd-1 from state: Get http://10.2.2.1:5240/MAAS/api/2.0/version/: dial tcp 10.2.2.1:5240: connect: connection refused
2021-02-18 17:42:59 ERROR juju.worker.dependency engine.go:587 "storage-provisioner" manifold worker returned unexpected error: watching block devices: getting environ: Get http://10.2.2.1:5240/MAAS/api/2.0/version/: dial tcp 10.2.2.1:5240: connect: connection refused
2021-02-18 17:43:00 WARNING juju.cmd.jujud machine.go:831 determining kvm support: INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_intel
modprobe: FATAL: Module msr not found in directory /lib/modules/4.15.0-132-generic
: exit status 1
no kvm containers possible

Any ideas ?

Thank you for the logs, @lemonstar!

Lxd recently added support for virtual machines, in addition to containers. It looks like it’s initially trying to spin up a kvm (kernel virtual machine), and then failing. I’m not sure why it would work on restart.

There are two possibilities:

  1. We have a bug in Juju, where it is not correctly asking lxd for containers.
  2. You configured your container cloud to use kvm by default, and that is causing trouble, because your host machine isn’t configured to support virtualization.

If option #2 doesn’t sound likely, let me know. I’ll look into option #1 in the meantime. (There were some app armor changes in focal that broke the way that we were testing lxd – it’s possible that when we refactored out tests, we dropped in some non default configuration for lxd, and missed that this could be an issue.)

Thank you for looking into this. We found that there was a missing file on some of the machines called “seeded” in the percona directory. That solved the reporting it seems. I hope this helps someone.