Vault is sealed after unit pause - did we shoot ourselves in the foot?

elvinas · 15 April 2020 13:40

In the development/POC environment a totally-unsecure-auto-unlock option was used for Vault deployment.

This environment was attempted to migrate (if I correctly understood) to another VMware DC and this became source of the troubles. First we have stumbled upon Bug #1871729 “handle cloud region rename in running controller” : Bugs : juju. I have manually fixed MongoDB, then apparently Vault does depend on a MySQL cluster, which also have been in failed state.

Now Vault unit is starting but errors with “Vault is sealed” error in the logs.

… Python traceback is skipped…
2020-04-15 13:34:06 DEBUG config-changed hvac.exceptions.VaultDown: Vault is sealed

Juju reports status like this. Sometimes unit /0 status changes to leader election failed after reboot.

Unit       Workload  Agent  Machine  Public address  Ports          Message
vault/0*   error     idle   19       *************   8200/tcp       hook failed: "config-changed"
  nrpe/16  active    idle            *************   icmp,5666/tcp  ready
  ntp/17   active    idle            *************   123/udp        chrony: Ready
vault/1    blocked   idle   26       *************   8200/tcp       Unit is sealed
  nrpe/25  active    idle            *************   icmp,5666/tcp  ready
  ntp/26   active    idle            *************   123/udp        chrony: Ready

By digging in the internet I have found an OpenStack guide. It states: It is important to remember that when the Vault process is started via the resume action its state will be sealed . This means that steps will be required to unseal the process.

So having this in the controller command history, did we just shoot ourselves in both feet with 12 gauge shotgun? I did not type those commands, I am just doing post mortem.

665 juju run-action vault/0 pause --wait
666 juju status vault
667 juju run-action vault/1 pause --wait
668 juju run-action vault/0 resume --wait
669 juju status vault

Any chance that totally-unsecure-auto-unlock option did save unseal key some where on the system/MYSQL?

chris.macnaughton · 16 April 2020 06:13

If my guess is correct, the vault charm probably stored the unseal keys in the leader storage in Juju when using that config option. Could you juju run --unit vault/0 leader-get (don’t share out secrets if present, please )

aurelien-lourot · 16 April 2020 06:29

I tried it out and it’s good to mention that it will give you only one unseal key, which is enough because with totally-unsecure-auto-unlock the vault has been initialized with 1 key share and a threshold of 1. Example:

$ juju run -u vault/0 leader-get
keys: '["7327aae40d2fa65d652471df82f5dff010f2f3832d4c1dd6f7590957773a12ee"]'
...
root_token: s.2C3k0SKCWhNV8jaqF9iB3FiE

elvinas · 16 April 2020 11:05

Yes. That command helped. Now some circular etcd-vault dependency left to be solved.

Thank you

elvinas · 17 April 2020 11:18

aurelien-lourot:

I tried it out and it’s good to mention that it will give you only one unseal key, which is enough because with totally-unsecure-auto-unlock the vault has been initialized with 1 key share and a threshold of 1. Example:
$ juju run -u vault/0 leader-get
keys: '["7327aae40d2fa65d652471df82f5dff010f2f3832d4c1dd6f7590957773a12ee"]'
...
root_token: s.2C3k0SKCWhNV8jaqF9iB3FiE

Now I have some question for the future (this will be task for next sprint most likely and I do not have environment to check). Does Juju save unseal keys even, when totally-unsecure-auto-unlock is not used? Because if it does, it is not good to potentially leak even one key, not talking about root token, which is also should be kept secret?

Can anyone with proper Vault deployment check? I am not asking about actual keys, obviously.

aurelien-lourot · 17 April 2020 12:08

Hehe I had the same thoughts yesterday so I checked: if you don’t use totally-unsecure-auto-unlock, you’ll initialize and unseal your vault yourself by using the vault client and Juju will neither know the keys nor the root token, and so you won’t see anything when running leader-get.