[BUG] openstack hacluster apache2 service not running, wrong ssl cert name

It looks like a bug probably in core framework or something as it does affect multiple components like openstack-dashboard, neutron-api, cinder, ceph-radosgw. It looks like apache2 ssl template is correctly defined to use VIP, however, it could be some race condition as sometimes it does create symlink with VIP address and other times it does use interface ip addres and then apache2 won’t start as it’s looking for filename cert_ which doesn’t exists. I would open a bug, but don’t know in which component.

apache error:
ubuntu@juju-70b05d-4-lxd-7:~$ systemctl status apache2.service
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-07-21 18:24:19 UTC; 33min ago
Docs: https://httpd.apache.org/docs/2.4/
Process: 76340 ExecStart=/usr/sbin/apachectl start (code=exited, status=1/FAILURE)

Jul 21 18:24:19 juju-70b05d-4-lxd-7 systemd[1]: Starting The Apache HTTP Server...
Jul 21 18:24:19 juju-70b05d-4-lxd-7 apachectl[76343]: AH00526: Syntax error on line 7 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Jul 21 18:24:19 juju-70b05d-4-lxd-7 apachectl[76343]: SSLCertificateFile: file '/etc/apache2/ssl/neutron/cert_10.10.20.204' does not exist or is empty
Jul 21 18:24:19 juju-70b05d-4-lxd-7 apachectl[76340]: Action 'start' failed.
Jul 21 18:24:19 juju-70b05d-4-lxd-7 apachectl[76340]: The Apache error log may have more information.
Jul 21 18:24:19 juju-70b05d-4-lxd-7 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jul 21 18:24:19 juju-70b05d-4-lxd-7 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jul 21 18:24:19 juju-70b05d-4-lxd-7 systemd[1]: Failed to start The Apache HTTP Server.

ls -la ssl dir
ls -la /etc/apache2/ssl/horizon/
total 24
dr-xr-xr-x 2 root root 4096 Jul 21 18:20 .
drwxr-xr-x 3 root root 4096 Jul 21 18:18 …
lrwxrwxrwx 1 root root 60 Jul 21 18:20 cert_10.10.40.126 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root 60 Jul 21 18:20 cert_172.16.1.247 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 3175 Jul 21 18:20 cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root 59 Jul 21 18:20 key_10.10.40.126 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root 59 Jul 21 18:20 key_172.16.1.247 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 1678 Jul 21 18:20 key_eth2.juju-70b05d-3-lxd-10.maas

juju status
App Version Status Scale Charm Store Rev OS Notes
dashboard-mysql-router 8.0.20 active 3 mysql-router jujucharms 2 ubuntu
hacluster-horizon active 3 hacluster jujucharms 68 ubuntu
openstack-dashboard 18.3.2 blocked 3 openstack-dashboard jujucharms 304 ubuntu

Unit                         Workload  Agent  Machine   Public address  Ports           Message
openstack-dashboard/0        blocked   idle   3/lxd/10  10.10.20.135    80/tcp,443/tcp  Services not running that should be: apache2
  dashboard-mysql-router/2   active    idle             10.10.20.135                    Unit is ready
  hacluster-horizon/2        active    idle             10.10.20.135                    Unit is ready and clustered
openstack-dashboard/1        blocked   idle   4/lxd/9   10.10.20.130    80/tcp,443/tcp  Services not running that should be: apache2
  dashboard-mysql-router/1   active    idle             10.10.20.130                    Unit is ready
  hacluster-horizon/1        active    idle             10.10.20.130                    Unit is ready and clustered
openstack-dashboard/2*       blocked   idle   5/lxd/9   10.10.20.124    80/tcp,443/tcp  Services not running that should be: apache2
  dashboard-mysql-router/0*  active    idle             10.10.20.124                    Unit is ready
  hacluster-horizon/0*       active    idle             10.10.20.124                    Unit is ready and clustered

Quick update, reissues certificates once hacluster stage is completed will re-create all symlinks, there is a bug for it related to dashboard, unfortunately it’s private.
juju run-action --wait vault/0 reissue-certificates

A workaround is to create symlinks manually for each VIP defined in hacluster and then restart the apache2 service.

sudo ln -s /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas cert_10.10.20.201
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ ls -la
total 28
dr-xr-xr-x 2 root root 4096 Jul 21 19:23 .
drwxr-xr-x 3 root root 4096 Jul 21 18:18 ..
lrwxrwxrwx 1 root root   60 Jul 21 19:23 cert_10.10.20.201 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   60 Jul 21 18:20 cert_10.10.40.126 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   60 Jul 21 18:20 cert_172.16.1.247 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 3175 Jul 21 18:20 cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 18:20 key_10.10.40.126 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 18:20 key_172.16.1.247 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 1678 Jul 21 18:20 key_eth2.juju-70b05d-3-lxd-10.maas
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ sudo ln -s /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas cert_10.10.40.201
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ sudo ln -s /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas key_10.10.40.201
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ sudo ln -s /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas key_10.10.20.201
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ ls -la
total 32
dr-xr-xr-x 2 root root 4096 Jul 21 19:24 .
drwxr-xr-x 3 root root 4096 Jul 21 18:18 ..
lrwxrwxrwx 1 root root   60 Jul 21 19:23 cert_10.10.20.201 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   60 Jul 21 18:20 cert_10.10.40.126 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   60 Jul 21 19:23 cert_10.10.40.201 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   60 Jul 21 18:20 cert_172.16.1.247 -> /etc/apache2/ssl/horizon/cert_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 3175 Jul 21 18:20 cert_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 19:24 key_10.10.20.201 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 18:20 key_10.10.40.126 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 19:24 key_10.10.40.201 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
lrwxrwxrwx 1 root root   59 Jul 21 18:20 key_172.16.1.247 -> /etc/apache2/ssl/horizon/key_eth2.juju-70b05d-3-lxd-10.maas
-rw-r----- 1 root root 1678 Jul 21 18:20 key_eth2.juju-70b05d-3-lxd-10.maas
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ sudo systemctl start apache2
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ubuntu@juju-70b05d-3-lxd-10:/etc/apache2/ssl/horizon$ sudo systemctl status apache2
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
● apache2.service - The Apache HTTP Server
     Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-07-21 19:24:27 UTC; 4s ago
       Docs: https://httpd.apache.org/docs/2.4/
    Process: 139830 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
   Main PID: 139834 (apache2)
      Tasks: 107 (limit: 38355)
     Memory: 32.8M
     CGroup: /system.slice/apache2.service
             ├─139834 /usr/sbin/apache2 -k start
             ├─139835 (wsgi:horizon)    -k start
             ├─139836 (wsgi:horizon)    -k start
             ├─139837 (wsgi:horizon)    -k start
             ├─139838 (wsgi:horizon)    -k start
             ├─139839 /usr/sbin/apache2 -k start
             └─139840 /usr/sbin/apache2 -k start

Jul 21 19:24:27 juju-70b05d-3-lxd-10 systemd[1]: Starting The Apache HTTP Server...
Jul 21 19:24:27 juju-70b05d-3-lxd-10 apachectl[139833]: AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/default-ssl.conf:3
Jul 21 19:24:27 juju-70b05d-3-lxd-10 systemd[1]: Started The Apache HTTP Server.
2 Likes

Thank you for posting this, I too encountered this bug. Reissuing the certs did not work for me. However, your workaround for manually re-creating the symlink did work.

Nice one!

Just one follow up question: What do you do if the cert_* and key_* files are not in the directory apache2 is looking for the cert? Is it safe to copy cert/key files from a working unit? the “reissue_certificate” command doesn’t do anything like create the file.

I think reissue_certs recreates all certs and should restart all services as well. I’m 99% sure, the biggest issue at the moment is that you can’t restart a single unit. Doing stuff manually should work as well but then you have a problem that new deployment won’t save your changes.

How do you create the certificates manually, in particular for nova-cloud-controller? Can I copy the cert/key files in the working unit to the other two blocked units?

I was able to manually create the symlinks in my other services except on 2 of the 3 nova-cloud-controllers i have. Two of them have an empty directory

ubuntu@juju-e9be94-1-lxd-11:~$ sudo systemctl status apache2.service
● apache2.service - The Apache HTTP Server
     Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2020-09-01 16:33:18 UTC; 1min 19s ago
       Docs: https://httpd.apache.org/docs/2.4/

Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: Starting The Apache HTTP Server...
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3577]: AH00526: Syntax error on line 14 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3577]: SSLCertificateFile: file '/etc/apache2/ssl/nova/cert_10.80.20.205' does not exist or is empty
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3574]: Action 'start' failed.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3574]: The Apache error log may have more information.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: apache2.service: Failed with result 'exit-code'.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: Failed to start The Apache HTTP Server.
ubuntu@juju-e9be94-1-lxd-11:~$ ls -lha /etc/apache2/ssl/nova
total 8.0K
dr-xr-xr-x 2 root root 4.0K Aug 31 22:33 .
drwxr-xr-x 3 root root 4.0K Aug 31 22:33 ..
ubuntu@juju-e9be94-1-lxd-11:~$
  • EDIT: I should mention that running juju run-action --wait vault/leader reissue-certificates did not work for me :upside_down_face: I even removed a single unit and added it back. the new unit also has an empty /etc/apache2/ssl/nova certs directory
1 Like

It looks like a bug or something went wrong during deployment, if you can reproduce it then open a bug for it, I’ve also found openstack-charms IRC channel very useful, you’ll always find charms developers there and they will be able to tell you weather it’s a bug or miss-config, last time I’ve deployed nova-cloud-controller it did add the certificate correctly.

1 Like

This is in fact a bug, the fix for which landed in master 11 days ago [0]. I was a bit surprised to find it was not in our 20.08 release. So I am back-porting it to 20.08 now [1]. That should take about a day to land and become available in cs:openstack-dashbaord.

[0] https://review.opendev.org/#/c/747115/
[1] https://review.opendev.org/#/c/749393/


David Ames

1 Like

Created LP Bug #1893847 for nova-cloud-controller.

[0] Bug #1893847 “Certificates are not created” : Bugs : OpenStack nova-cloud-controller charm

1 Like

Any chance of implementing single host certificate re-issue? re-issue all does work with initial deployment but if you run into trouble later it does cause a bit of a havoc across whole openstack. It would be a much safer option.

Thanks

I would like to move this discussion to the LP bug #1893847 [0]. I will be trying to re-create the failure mode today.

If anyone on the thread has a bundle they can share that would be helpful.

[0] Bug #1893847 “Certificates are not created” : Bugs : OpenStack nova-cloud-controller charm

Hells yeah! This is great stuff to know when re configuring your base to be HA.
Thank you!

This works really well for the openstack-dashboard units.
A key gets created for the hostname of the unit and then symlinks for it’s IP addresses are pointed to the hostname key and cert. A sepereate key and cert is also generated for the os-public-hostname if set.

The same does not hold true for neutron-api though.
Certs and keys are generated for the units hostname but the os-public-hostname key and cert are symlinked back to the hostname resulting in a certificate mismatch.

*** updated - a workaround for this is to remove relations to keystone and vault from neutron-api and add them back. It’s a bit sucky but seems to have worked

Maybe we should deploy units HA first, unseal vault at latest step.

The units with this bug,all deployed before vault unseal operations.

The added HA units after vault unseal operations ,have not been with this bug.

Note. Expect similar behavior with Wallaby. I build three different clusters with same YAML script. One of of the cluster exhibited this problem. Best work around is manually create symlinks to the cert and key files.