In the development/POC environment a totally-unsecure-auto-unlock option was used for Vault deployment.
This environment was attempted to migrate (if I correctly understood) to another VMware DC and this became source of the troubles. First we have stumbled upon Bug #1871729 “handle cloud region rename in running controller” : Bugs : juju. I have manually fixed MongoDB, then apparently Vault does depend on a MySQL cluster, which also have been in failed state.
Now Vault unit is starting but errors with “Vault is sealed” error in the logs.
… Python traceback is skipped…
2020-04-15 13:34:06 DEBUG config-changed hvac.exceptions.VaultDown: Vault is sealed
Juju reports status like this. Sometimes unit /0 status changes to leader election failed after reboot.
Unit Workload Agent Machine Public address Ports Message
vault/0* error idle 19 ************* 8200/tcp hook failed: "config-changed"
nrpe/16 active idle ************* icmp,5666/tcp ready
ntp/17 active idle ************* 123/udp chrony: Ready
vault/1 blocked idle 26 ************* 8200/tcp Unit is sealed
nrpe/25 active idle ************* icmp,5666/tcp ready
ntp/26 active idle ************* 123/udp chrony: Ready
By digging in the internet I have found an OpenStack guide. It states: It is important to remember that when the Vault process is started via the resume action its state will be sealed
. This means that steps will be required to unseal the process.
So having this in the controller command history, did we just shoot ourselves in both feet with 12 gauge shotgun? I did not type those commands, I am just doing post mortem.
665 juju run-action vault/0 pause --wait
666 juju status vault
667 juju run-action vault/1 pause --wait
668 juju run-action vault/0 resume --wait
669 juju status vault
Any chance that totally-unsecure-auto-unlock option did save unseal key some where on the system/MYSQL?