ceph-radosgw stuck waiting: "Incomplete relations: identity"

Hi there, my name is Felix, I am currently working on setting up a charmed OpenStack testing enviroment for future datacenter and application modernization projects. The goal of this project is to implement a hyperconverged testing facility.

To achieve this, I modified the OpenStack Base charm bundles to our HA requirements and to my content and proceeded to testing it utilising a MAAS rack controller. The release is OpenStack Zed deployed upon Ubuntu Server 22.04 LTS Jammy.

Considering ceph, I used the quincy/stable releases according to the UCA. I was able to get only one full deployment running - the first at that - but just once. Since retrying to reproduce some things (fully wiped the environment before but retained the configuration), ceph-radosgw never returns fully functional after the deployment, even after waiting a longer time.

The charm ceph-radosgw is stuck in “waiting” state while reporting “Incomplete relations: identity”:

Model      Controller            Cloud/Region         Version  SLA          Timestamp
openstack  juju-maas-controller  maas/default  3.1.6    unsupported  12:25:10+01:00

App                     Version  Status   Scale  Charm         Channel        Rev  Exposed  Message
ceph-radosgw            17.2.6   waiting      3  ceph-radosgw  quincy/stable  564  no       Incomplete relations: identity
ceph-radosgw-hacluster  2.1.2    active       3  hacluster     2.4/stable     131  no       Unit is ready and clustered

Unit                         Workload  Agent      Machine  Public address  Ports   Message
ceph-radosgw/0               waiting   executing  0/lxd/1  192.168.80.10   80/tcp  Incomplete relations: identity
  ceph-radosgw-hacluster/1   active    idle                192.168.80.10           Unit is ready and clustered
ceph-radosgw/1               waiting   executing  1/lxd/1  192.168.80.92   80/tcp  Incomplete relations: identity
  ceph-radosgw-hacluster/2   active    idle                192.168.80.92           Unit is ready and clustered
ceph-radosgw/2*              waiting   executing  2/lxd/1  192.168.80.146  80/tcp  Incomplete relations: identity
  ceph-radosgw-hacluster/0*  active    idle                192.168.80.146          Unit is ready and clustered

Machine  State    Address         Inst id              Base          AZ                 Message
0        started  192.168.80.212  one-muc-02           ubuntu@22.04  testcenter  Deployed
0/lxd/1  started  192.168.80.10   juju-e16cdc-0-lxd-1  ubuntu@22.04  testcenter  Container started
1        started  192.168.80.211  one-muc-01           ubuntu@22.04  testcenter  Deployed
1/lxd/1  started  192.168.80.92   juju-e16cdc-1-lxd-1  ubuntu@22.04 testcenter  Container started
2        started  192.168.80.213  one-muc-03           ubuntu@22.04  testcenter  Deployed
2/lxd/1  started  192.168.80.146  juju-e16cdc-2-lxd-1  ubuntu@22.04  testcenter  Container started

Removing and restarting the deployment for these components results in the ceph-radosgw service being not recognized as running by juju status or hacluster being reported as “vip not running”. Removing and re-adding the relation also does not seem to help (see the warnings at 11:53 below).

To diagnose this, I replayed the leaders logs:

unit-ceph-radosgw-2: 11:53:16 INFO juju Starting unit workers for "ceph-radosgw/2"
unit-ceph-radosgw-2: 11:53:16 INFO juju.worker.apicaller [48050a] "unit-ceph-radosgw-2" successfully connected to "192.168.80.217:17070"
unit-ceph-radosgw-2: 11:53:16 INFO juju.worker.migrationminion migration phase is now: NONE
unit-ceph-radosgw-2: 11:53:16 INFO juju.worker.logger logger worker started
unit-ceph-radosgw-2: 11:53:16 INFO juju.worker.upgrader no waiter, upgrader is done
unit-ceph-radosgw-2: 11:53:17 WARNING juju.worker.uniter.relation unit keystone/1 in relation 51 no longer exists
unit-ceph-radosgw-2: 11:53:17 WARNING juju.worker.uniter.relation unit keystone/2 in relation 51 no longer exists
unit-ceph-radosgw-2: 11:53:17 WARNING juju.worker.uniter.relation unit keystone/0 in relation 51 no longer exists
unit-ceph-radosgw-2: 11:53:18 INFO juju.worker.uniter unit "ceph-radosgw/2" started
unit-ceph-radosgw-2: 11:53:18 INFO juju.worker.uniter hooks are retried true
unit-ceph-radosgw-2: 11:53:18 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:53:23 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/ceph/ceph.conf
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/apache2/sites-available/openstack_https_frontend.conf
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Rendering from template: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:53:24 INFO unit.ceph-radosgw/2.juju-log mon:50: Wrote template /etc/haproxy/haproxy.cfg.
unit-ceph-radosgw-2: 11:55:06 INFO juju Starting unit workers for "ceph-radosgw/2"
unit-ceph-radosgw-2: 11:55:06 INFO juju.worker.apicaller [48050a] "unit-ceph-radosgw-2" successfully connected to "192.168.80.217:17070"
unit-ceph-radosgw-2: 11:55:06 INFO juju.worker.apicaller [48050a] "unit-ceph-radosgw-2" successfully connected to "192.168.80.217:17070"
unit-ceph-radosgw-2: 11:55:06 INFO juju.worker.migrationminion migration phase is now: NONE
unit-ceph-radosgw-2: 11:55:06 INFO juju.worker.logger logger worker started
unit-ceph-radosgw-2: 11:55:06 INFO juju.worker.upgrader no waiter, upgrader is done
unit-ceph-radosgw-2: 11:55:07 INFO juju.worker.uniter unit "ceph-radosgw/2" started
unit-ceph-radosgw-2: 11:55:07 INFO juju.worker.uniter hooks are retried true
unit-ceph-radosgw-2: 11:55:08 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:55:13 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:55:13 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:55:13 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/ceph/ceph.conf
unit-ceph-radosgw-2: 11:55:13 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/apache2/sites-available/openstack_https_frontend.conf
unit-ceph-radosgw-2: 11:55:14 INFO unit.ceph-radosgw/2.juju-log mon:50: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
unit-ceph-radosgw-2: 11:55:14 INFO unit.ceph-radosgw/2.juju-log mon:50: Rendering from template: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:55:14 INFO unit.ceph-radosgw/2.juju-log mon:50: Wrote template /etc/haproxy/haproxy.cfg.
unit-ceph-radosgw-2: 11:59:43 INFO juju Starting unit workers for "ceph-radosgw/2"
unit-ceph-radosgw-2: 11:59:43 INFO juju.worker.apicaller [48050a] "unit-ceph-radosgw-2" successfully connected to "192.168.80.217:17070"
unit-ceph-radosgw-2: 11:59:43 INFO juju.worker.apicaller [48050a] "unit-ceph-radosgw-2" successfully connected to "192.168.80.217:17070"
unit-ceph-radosgw-2: 11:59:43 INFO juju.worker.migrationminion migration phase is now: NONE
unit-ceph-radosgw-2: 11:59:43 INFO juju.worker.logger logger worker started
unit-ceph-radosgw-2: 11:59:43 INFO juju.worker.upgrader no waiter, upgrader is done
unit-ceph-radosgw-2: 11:59:44 INFO juju.worker.uniter unit "ceph-radosgw/2" started
unit-ceph-radosgw-2: 11:59:44 INFO juju.worker.uniter hooks are retried true
unit-ceph-radosgw-2: 11:59:44 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:59:49 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/ceph/ceph.conf
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Registered config file: /etc/apache2/sites-available/openstack_https_frontend.conf
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Rendering from template: /etc/haproxy/haproxy.cfg
unit-ceph-radosgw-2: 11:59:50 INFO unit.ceph-radosgw/2.juju-log mon:50: Wrote template /etc/haproxy/haproxy.cfg.

I have absolutely no clue what error resolution for the hook is awaited here. Also, the warnings at 1153 are expected since we tried killing and re-adding the relation. We also played with rebooting and restarting services to no avail.

To further look into this, I took a look into the unit leaders log file by ssh-ing into the machine, and observed something that repeated itself at deployment time, but is no longer present in the logs at the moment. See this excerpt, which ends when the logs abruptly cut for this time period:

2023-12-12 15:33:13 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: identity relation's interface, identity-service, is related awaiting the following data from the relationship: service_port, service_host, auth_host, auth_port, internal_host, internal_port, admin_tenant_name, admin_user, admin_password.
2023-12-12 15:33:14 INFO juju.worker.uniter.operation runhook.go:186 ran "mon-relation-changed" hook (via explicit, bespoke hook script)
2023-12-12 15:34:04 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Registered config file: /etc/haproxy/haproxy.cfg
2023-12-12 15:34:04 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Registered config file: /etc/ceph/ceph.conf
2023-12-12 15:34:06 WARNING unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Package python-keystonemiddleware has no installation candidate.
2023-12-12 15:34:06 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:06 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:06 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:06 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: identity relation's interface, identity-service, is related awaiting the following data from the relationship: service_port, service_host, auth_host, auth_port, internal_host, internal_port, admin_tenant_name, admin_user, admin_password.
2023-12-12 15:34:06 INFO juju.worker.uniter.operation runhook.go:186 ran "mon-relation-changed" hook (via explicit, bespoke hook script)
2023-12-12 15:34:29 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Registered config file: /etc/haproxy/haproxy.cfg
2023-12-12 15:34:29 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Registered config file: /etc/ceph/ceph.conf
2023-12-12 15:34:29 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Cluster configured, notifying other services andupdating keystone endpoint configuration
2023-12-12 15:34:30 WARNING unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Package python-keystonemiddleware has no installation candidate.
2023-12-12 15:34:30 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:30 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:30 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:30 INFO unit.ceph-radosgw/2.juju-log server.go:325 ha:92: identity relation's interface, identity-service, is related awaiting the following data from the relationship: service_port, service_host, auth_host, auth_port, internal_host, internal_port, admin_tenant_name, admin_user, admin_password.
2023-12-12 15:34:31 INFO juju.worker.uniter.operation runhook.go:186 ran "ha-relation-changed" hook (via explicit, bespoke hook script)
2023-12-12 15:34:55 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Registered config file: /etc/haproxy/haproxy.cfg
2023-12-12 15:34:55 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Registered config file: /etc/ceph/ceph.conf
2023-12-12 15:34:55 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Loaded template from /var/lib/juju/agents/unit-ceph-radosgw-2/charm/hooks/charmhelpers/contrib/openstack/templates/haproxy.cfg
2023-12-12 15:34:55 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Rendering from template: /etc/haproxy/haproxy.cfg
2023-12-12 15:34:55 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Wrote template /etc/haproxy/haproxy.cfg.
2023-12-12 15:34:56 WARNING unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Package python-keystonemiddleware has no installation candidate.
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Missing required data: service_port service_host auth_host auth_port internal_host internal_port admin_tenant_name admin_user admin_password
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Loaded template from templates/ceph.conf
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Rendering from template: /etc/ceph/ceph.conf
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Wrote template /etc/ceph/ceph.conf.
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Making dir /var/lib/ceph/radosgw/ceph-rgw.juju-e16cdc-2-lxd-1 ceph:ceph 750
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Symlinking /var/lib/ceph/radosgw/ceph-rgw.juju-e16cdc-2-lxd-1/keyring as /etc/ceph/ceph.client.rgw.juju-e16cdc-2-lxd-1.keyring
2023-12-12 15:34:56 WARNING unit.ceph-radosgw/2.mon-relation-changed logger.go:60 Synchronizing state of radosgw.service with SysV service script with /lib/systemd/systemd-sysv-install.
2023-12-12 15:34:56 WARNING unit.ceph-radosgw/2.mon-relation-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install disable radosgw
2023-12-12 15:34:56 WARNING unit.ceph-radosgw/2.mon-relation-changed logger.go:60 Unit /etc/systemd/system/radosgw.service is masked, ignoring.
2023-12-12 15:34:56 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Installing python-dbus with options: ['--option=Dpkg::Options::=--force-confold']
2023-12-12 15:34:57 WARNING unit.ceph-radosgw/2.mon-relation-changed logger.go:60 E: Package 'python-dbus' has no installation candidate
2023-12-12 15:34:57 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Check command not found: check_disk
2023-12-12 15:34:57 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: removed check_haproxy. This service will be monitored by check_crm
2023-12-12 15:34:57 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Check command not found: check_systemd.py
2023-12-12 15:34:57 INFO unit.ceph-radosgw/2.juju-log server.go:325 mon:50: Nagios user not set up, nrpe checks not updated

These events were present since the deployment and abruptly cut there. I can not reproduce these events but expect that these come from keystone being available later than ceph-radosgw.

I have already searched several pages and forums with no result on my part.

If someone might be experienced with this charm - can you please assist me with troubleshooting this?

Thank you very much.

Tried changing the OpenStack release to match a different ceph release (specifically ceph-radosgw, reef). Same issue when first deploying it. Tried redeploying it.

Now I have 3 units in different states. I must be going crazy.

Unit                         Workload     Agent       Machine   Public address  Ports     Message
ceph-radosgw/3               active       executing   0/lxd/16  192.168.80.197  80/tcp    Unit is ready
  ceph-radosgw-hacluster/4   active       idle                  192.168.80.197            Unit is ready and clustered
ceph-radosgw/4*              blocked      executing   1/lxd/17  192.168.80.164  80/tcp    Services not running that should be: ceph-radosgw@rgw.juju-650ab0-1-lxd-17
  ceph-radosgw-hacluster/3*  active       idle                  192.168.80.164            Unit is ready and clustered
ceph-radosgw/5               waiting      executing   2/lxd/16  192.168.80.88   80/tcp    Incomplete relations: identity
  ceph-radosgw-hacluster/5   active       idle                  192.168.80.88             Unit is ready and clustered

Any ideas?

Backtracking my steps I made the discovery that implementing network segmentation by using space bindings did break ceph-radosgw. Once the space bindings of ceph-osd and ceph-mon had been confined to a single network, no more errors occured.

That is a pity, because this would be a basic requirement. I will search the documentation for proper space bindings. Have to look further into this, but can be closed.