I’m trying to enable HA on some components that usually supports it through the hacluster charm.
Everything seems to bo correct but, only 1 component out of 3 seem to work properly.
The 2 others have an error message saying that apache is not started.
SSHing to one of those unit, I see that the certificate correspondig to the vip has not been generated on the units that are not the leader unit but apache configuration is expecting it and so, since the certificate is not there, it fails.
Why so ?
Why this certificate is not generated on the non leader units ?
It happened with all the apps I tried to configure as HA such as Glance, Heat, keystone, …
They all fail for the same reason.
Just to clarify : I have no issue with Vault and this deployment work like a charm when my components are not in HA, only when a scale them up with hacluster, it fails.
Here is the ending result :
App Version Status Scale Charm Store Rev OS Notes
heat 14.0.0 blocked 3 heat jujucharms 277 ubuntu
heat-hacluster active 3 hacluster jujucharms 69 ubuntu
heat-mysql-router 8.0.21 active 3 mysql-router jujucharms 3 ubuntu
Unit Workload Agent Machine Public address Ports Message
heat/0 blocked idle 0/lxd/2 192.168.210.36 8000/tcp,8004/tcp Services not running that should be: apache2
heat-hacluster/2 active idle 192.168.210.36 Unit is ready and clustered
heat-mysql-router/2 active idle 192.168.210.36 Unit is ready
heat/1 blocked idle 1/lxd/2 192.168.210.23 8000/tcp,8004/tcp Services not running that should be: apache2
heat-hacluster/1 active idle 192.168.210.23 Unit is ready and clustered
heat-mysql-router/1 active idle 192.168.210.23 Unit is ready
heat/2* blocked idle 2/lxd/1 192.168.210.41 8000/tcp,8004/tcp Services not running that should be: apache2
heat-hacluster/0* active idle 192.168.210.41 Unit is ready and clustered
heat-mysql-router/0* active idle 192.168.210.41 Unit is ready
The workaround I used was to manually create the symlinks to the certs on those failing units. If you look at the status of the apache2 service, it will tell you what cert it is failing to find at service startup. From there you can go to the path and run sudo ln -s command to create the links.
After some investigation together with @Hybrid512 we found out that there is some easy-to-reproduce randomness involving a Vault with the auto-unlock feature (this seems less likely to happen without this feature). Here is a simple bundle:
In my bundle I deploy 1 vault unit and wait for mysql to settle down with all the cluster/db creations it’s doing and then scale up the Vault units by 2 to satisfy hacluster. I do this with openstack-dashboard and a few other charms as well. I took this route after noticing a patter within my CI/CD pipeline. If I deploy a bundle with all the units needed for HA, charms would end up in failed ha-relation changed hook states. I have a higher success rate if I just deployed 1 unit for each charm and then just wait for when everything is settled down to scale up the remaining hacluster units for each charm. To me it looks like a race condition issue, however I’m not quite sure.
I would say that too … I tried (and retried … and retried …) without those “hacky” options by unsealing vault manually … it works but didn’t really change the situation.
I still have some charms that are having issues with their symlinks to certificates and this happens randomly.
As a hint, my bundle is quite big and my machines are very heavily loaded during deployment (high cpu usage but also high IO usage) … to me this is probably a race condition, in any case, this is not reliable.