Dear community,
Two days ago I upgraded our production controller from 2.9.28 to 2.9.32.
Since then, charms have leadership issues or won’t start. An example here is mysql/0, a single unit application, directlly after start:
unit-mysql-0: 09:27:21 INFO juju Starting unit workers for "mysql/0"
unit-mysql-0: 09:27:21 INFO juju.agent.setup setting logging config to "<root>=INFO;unit=INFO;"
unit-mysql-0: 09:27:21 INFO juju.worker.apicaller [9fe214] "unit-mysql-0" successfully connected to "172.23.1.34:17070"
unit-mysql-0: 09:27:21 INFO juju.worker.migrationminion migration phase is now: NONE
unit-mysql-0: 09:27:21 INFO juju.worker.logger logger worker started
unit-mysql-0: 09:27:21 INFO juju.worker.upgrader no waiter, upgrader is done
machine-1-lxd-4: 09:27:21 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
unit-mysql-0: 09:27:21 INFO juju.worker.uniter unit "mysql/0" shutting down: failed to initialize uniter for "unit-mysql-0": cannot create relation state track
er: "mysql/0" is not leader of "mysql"
unit-mysql-0: 09:27:21 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: failed to initialize uniter for "unit-mysql-0": cannot
create relation state tracker: "mysql/0" is not leader of "mysql"
machine-1-lxd-4: 09:27:25 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
unit-mysql-0: 09:27:25 INFO juju.worker.uniter unit "mysql/0" started
unit-mysql-0: 09:27:25 INFO juju.worker.uniter hooks are retried false
unit-mysql-0: 09:27:25 INFO juju.worker.uniter unit "mysql/0" shutting down: catacomb 0xc0006c5200 is dying
unit-mysql-0: 09:27:25 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: unknown object type "SecretsManager" (not implemented)
machine-1-lxd-4: 09:27:29 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
unit-mysql-0: 09:27:30 INFO juju.worker.uniter unit "mysql/0" started
unit-mysql-0: 09:27:30 INFO juju.worker.uniter hooks are retried false
unit-mysql-0: 09:27:30 INFO juju.worker.uniter unit "mysql/0" shutting down: catacomb 0xc00172a480 is dying
unit-mysql-0: 09:27:30 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: unknown object type "SecretsManager" (not implemented)
machine-1-lxd-4: 09:27:35 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
unit-mysql-0: 09:27:35 INFO juju.worker.uniter unit "mysql/0" started
unit-mysql-0: 09:27:35 INFO juju.worker.uniter hooks are retried false
unit-mysql-0: 09:27:35 INFO juju.worker.uniter unit "mysql/0" shutting down: catacomb 0xc0001a0d80 is dying
unit-mysql-0: 09:27:35 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: unknown object type "SecretsManager" (not implemented)
That “catacomb is dying” and then restart repeats forever every minute after that.
Another example, from another single-unit charm:
unit-glance-0: 10:01:34 INFO juju.worker.uniter unit "glance/0" started
unit-glance-0: 10:01:34 INFO juju.worker.uniter hooks are retried false
unit-glance-0: 10:01:34 WARNING juju.worker.uniter.operation we should run a leader-deposed hook here, but we can't yet
unit-glance-0: 10:01:34 INFO juju.worker.uniter found queued "leader-elected" hook
unit-glance-0: 10:01:34 INFO juju.worker.uniter.operation unit is no longer the leader; skipping "leader-elected" execution
unit-glance-0: 10:01:34 INFO juju.worker.uniter.operation skipped "leader-settings-changed" hook (missing)
unit-glance-0: 10:02:32 INFO juju.worker.uniter found queued "leader-elected" hook
unit-glance-0: 10:02:32 WARNING juju.worker.uniter.resolver executor lock acquisition cancelled
unit-glance-0: 10:02:32 INFO juju.worker.uniter unit "glance/0" shutting down: (re)starting watcher: catacomb 0xc000ce1200 is dying
unit-glance-0: 10:02:32 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: unknown object type "SecretsManager" (not implemented)
Some units (but far from all) seem to behave better after shutting down the agent for more than 60 seconds (a trick I found in some older post that has helped me before), so basically doing this on the problematic unit:
# systemctl stop jujud-machine-1-lxd-4.service ; sleep 70 ; systemctl start jujud-machine-1-lxd-4.service
I have not yet upgraded our two OpenStack juju models from 2.9.28 to 2.9.32 since the system behaves so weird.
What could be related: The controller didn’t go as smooth as usual, on one of the machines the upgrade retried for 20 minutes or so, but in the end the model reports the new version, and the controllers have restarted their processes, and the version numbers have been bumped. Then suddenly all was ok:
Model Controller Cloud/Region Version SLA Timestamp
controller lxd-controller localhost/localhost 2.9.32 unsupported 07:36:59Z
Machine State DNS Inst id Series AZ Message
0 started 172.23.1.34 juju-e64711-0 bionic Running
1 started 172.23.2.56 juju-e64711-1 bionic Running
2 started 172.23.3.46 juju-e64711-2 bionic Running
We have a proper Juju backup (from juju create-backup), but I’m hoping there’s a way to get fix this without such drastic measures.
Thanks in advance!