Ceph-radosgw unit stuck and saying Services not running that should be: ceph-radosgw@rgw.juju-900ab0-3-lxd-2

hi, I’ve encountered ceph-radosgw unit issue when proceeding to deploy openstack.

juju status show this output

ceph-radosgw/0*    blocked   executing  3/lxd/2  80/tcp    Services not running that should be: ceph-radosgw@rgw.juju-900ab0-3-lxd-2

This situation happened only on leader of unit, other ones are fine.

$ juju status ceph-radosgw
Model      Controller        Cloud/Region        Version  SLA          Timestamp
openstack  foundations-maas  maas_cloud/default  2.9.21   unsupported  01:03:38Z

App                    Version  Status   Scale  Charm             Store       Channel  Rev  OS      Message
ceph-radosgw           15.2.14  blocked      3  ceph-radosgw      charmstore  stable   300  ubuntu  Services not running that should be: ceph-radosgw@rgw.juju-900ab0-3-lxd-2
hacluster-radosgw               active       3  hacluster         charmstore  stable    81  ubuntu  Unit is ready and clustered
logrotated                      active       3  logrotate-charm   charmstore  stable     5  ubuntu  Unit is ready.
public-policy-routing           active       3  advanced-routing  charmstore  stable     7  ubuntu  Unit is ready

Unit                        Workload  Agent      Machine  Public address  Ports   Message
ceph-radosgw/0*             blocked   executing  3/lxd/2  10.138.159.61   80/tcp  Services not running that should be: ceph-radosgw@rgw.juju-900ab0-3-lxd-2
  hacluster-radosgw/0*      active    idle                10.138.159.61           Unit is ready and clustered
  logrotated/1              active    idle                10.138.159.61           Unit is ready.
  public-policy-routing/1   active    idle                10.138.159.61           Unit is ready
ceph-radosgw/1              active    idle       5/lxd/2  10.138.159.172  80/tcp  Unit is ready
  hacluster-radosgw/1       active    idle                10.138.159.172          Unit is ready and clustered
  logrotated/24             active    idle                10.138.159.172          Unit is ready.
  public-policy-routing/17  active    idle                10.138.159.172          Unit is ready
ceph-radosgw/2              active    idle       7/lxd/2  10.138.159.178  80/tcp  Unit is ready
  hacluster-radosgw/2       active    idle                10.138.159.178          Unit is ready and clustered
  logrotated/43             active    idle                10.138.159.178          Unit is ready.
  public-policy-routing/31  active    idle                10.138.159.178          Unit is ready

I checked that service on unit, it is running. I found apache service is stopped.

ubuntu@juju-900ab0-3-lxd-2:/var/log/juju$ sudo systemctl status apache2.service
● apache2.service - The Apache HTTP Server
     Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2021-12-27 07:04:53 UTC; 1h 12min ago
       Docs: https://httpd.apache.org/docs/2.4/

Dec 27 07:04:53 juju-900ab0-3-lxd-2 systemd[1]: Starting The Apache HTTP Server...
Dec 27 07:04:53 juju-900ab0-3-lxd-2 apachectl[88605]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.20.0.32. >Dec 27 07:04:53 juju-900ab0-3-lxd-2 apachectl[88605]: no listening sockets available, shutting down
Dec 27 07:04:53 juju-900ab0-3-lxd-2 apachectl[88605]: AH00015: Unable to open logs
Dec 27 07:04:53 juju-900ab0-3-lxd-2 apachectl[88602]: Action 'start' failed.
Dec 27 07:04:53 juju-900ab0-3-lxd-2 apachectl[88602]: The Apache error log may have more information.
Dec 27 07:04:53 juju-900ab0-3-lxd-2 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Dec 27 07:04:53 juju-900ab0-3-lxd-2 systemd[1]: apache2.service: Failed with result 'exit-code'.
Dec 27 07:04:53 juju-900ab0-3-lxd-2 systemd[1]: Failed to start The Apache HTTP Server.

looks like apache’s port is occupied.

ubuntu@juju-900ab0-3-lxd-2:/var/log/juju$ ss -tulnp | grep 80
tcp    LISTEN  0       4096           0.0.0.0:80           0.0.0.0:*            
tcp    LISTEN  0       4096                 *:80                 *:*

part recent log content

2021-12-27 04:03:32 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Inconsistent or absent auth returned by mon units. Setting auth_supported to 'none'
2021-12-27 04:03:33 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:03:33 INFO juju.worker.uniter.operation runhook.go:152 ran "update-status" hook (via explicit, bespoke hook script)
2021-12-27 04:09:27 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Inconsistent or absent auth returned by mon units. Setting auth_supported to 'none'
2021-12-27 04:09:28 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:09:28 INFO juju.worker.uniter.operation runhook.go:152 ran "update-status" hook (via explicit, bespoke hook script)
2021-12-27 04:14:19 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Inconsistent or absent auth returned by mon units. Setting auth_supported to 'none'
2021-12-27 04:14:19 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:14:20 INFO juju.worker.uniter.operation runhook.go:152 ran "update-status" hook (via explicit, bespoke hook script)
2021-12-27 04:18:57 WARNING unit.ceph-radosgw/0.juju-log server.go:327 mon:43: Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:18:57 INFO juju.worker.uniter.operation runhook.go:152 ran "mon-relation-changed" hook (via explicit, bespoke hook script)
2021-12-27 04:19:00 WARNING unit.ceph-radosgw/0.juju-log server.go:327 mon:43: Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:19:00 INFO juju.worker.uniter.operation runhook.go:152 ran "mon-relation-changed" hook (via explicit, bespoke hook script)
2021-12-27 04:19:27 WARNING unit.ceph-radosgw/0.juju-log server.go:327 Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:19:27 INFO juju.worker.uniter.operation runhook.go:152 ran "update-status" hook (via explicit, bespoke hook script)
2021-12-27 04:20:10 WARNING unit.ceph-radosgw/0.juju-log server.go:327 mon:43: Package python-keystonemiddleware has no installation candidate.
2021-12-27 04:20:10 WARNING unit.ceph-radosgw/0.mon-relation-changed logger.go:60 Synchronizing state of radosgw.service with SysV service script with /lib/systemd/systemd-sysv-install.
2021-12-27 04:20:10 WARNING unit.ceph-radosgw/0.mon-relation-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install disable radosgw
2021-12-27 04:20:11 WARNING unit.ceph-radosgw/0.mon-relation-changed logger.go:60 Unit /etc/systemd/system/radosgw.service is masked, ignoring.
2021-12-27 04:20:12 WARNING unit.ceph-radosgw/0.mon-relation-changed logger.go:60 Created symlink /etc/systemd/system/ceph-radosgw.target.wants/ceph-radosgw@rgw.juju-900ab0-3-lxd-2.service → /lib/systemd/system/ceph-radosgw@.service.

similar issue[1] like this but fixed in LP:1868387

could you please help to handle this issue?? thank you

[1]Unable to install OpenStack, radoswg stuck in blocked state

1 Like

I am experiencing a similar problem and I feel that haproxy and apache2 are conflicting for the port 80 on the ceph-radosgw unit. Looks like it might be related/similar issue to https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1904411

I am still investigating what’s happening but in my case haproxy is listening on port 80 which makes apache2 unable to start.

Another issue (which is really a source of the problem) is almost empty /etc/ceph/ceph.conf file. I am investigating if radosgw joined mon relations in juju

Solved by using ceph-mon + ceph-osd + ceph-radosgw from the same charm/unit and specifically from:

charm: ch:ceph-mon
channel: quincy/stable