Cinder ceph backed services down post cinder-backup deployment

I am having some trouble diagnosing an issue with Cinder, which seems to have occurred at the exact moment I attempted to deploy cinder-backup.

The timeline went something like this:

  • Fully working charmed OpenStack deployment using cinder-ceph. All services working OK.
  • I then attempted to deploy cinder-backup, but the deployment then stuck on error state and hook “ceph-relation-changed”. This is the ceph-backup error at that time:
2024-10-04 14:12:43 INFO unit.cinder-backup/3.juju-log server.go:325 ceph:150: Wrote template /var/lib/charm/cinder-backup/ceph.conf.
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60 Traceback (most recent call last):
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/strutils.py", line 94, in __init__
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     self.index = self._list.index(item)
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60 ValueError: tuple.index(x): x not in tuple
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60 During handling of the above exception, another exception occurred:
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60 Traceback (most recent call last):
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/ceph-relation-changed", line 190, in <module>
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     hooks.execute(sys.argv)
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/hookenv.py", line 963, in execute
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     self._hooks[hook_name]()
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/host.py", line 778, in wrapped_f
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     return restart_on_change_helper(
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/host.py", line 863, in restart_on_change_helper
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     r = lambda_f()
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/host.py", line 779, in <lambda>
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     (lambda: f(*args, **kwargs)),
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/ceph-relation-changed", line 106, in ceph_changed
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     backup_backend_joined(rid)
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/ceph-relation-changed", line 136, in backup_backend_joined
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     ctxt = CephBackupSubordinateContext()()
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/cinder_backup_contexts.py", line 41, in __call__
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     if CompareOpenStackReleases(release) < "icehouse":
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60   File "/var/lib/juju/agents/unit-cinder-backup-3/charm/hooks/charmhelpers/core/strutils.py", line 96, in __init__
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60     raise KeyError("Item '{}' is not in list '{}'"
2024-10-04 14:12:44 WARNING unit.cinder-backup/3.ceph-relation-changed logger.go:60 KeyError: "Item 'bobcat' is not in list '('diablo', 'essex', 'folsom', 'grizzly', 'havana', 'icehouse', 'juno', 'kilo', 'liberty', 'mitaka', 'newton', 'ocata', 'pike', 'queens', 'rocky', 'stein', 'train', 'ussuri', 'victoria', 'wallaby', 'xena', 'yoga')'"
2024-10-04 14:12:44 ERROR juju.worker.uniter.operation runhook.go:180 hook "ceph-relation-changed" (via explicit, bespoke hook script) failed: exit status 1
  • Rather than progress investigating this ceph-backup error, I simply removed ceph-backup again as it was not immediately needed.
  • Since then though my cinder-ceph services in “zo1” are down:
+------------------+---------------------------------+------+----------+-------+----------------------------+
| Binary           | Host                            | Zone | Status   | State | Updated At                 |
+------------------+---------------------------------+------+----------+-------+----------------------------+
| cinder-volume    | cinder@cinder-ceph              | zo1  | enabled  | up    | 2024-11-08T09:09:38.000000 |
| cinder-scheduler | cinder                          | nova | enabled  | up    | 2024-11-08T09:09:41.000000 |
| cinder-backup    | cinder                          | nova | enabled  | up    | 2024-11-08T09:09:36.000000 |
| cinder-volume    | juju-ef54ec-1-lxd-2@cinder-ceph | zo1  | enabled  | down  | 2024-10-04T15:01:26.000000 |
| cinder-volume    | juju-ef54ec-2-lxd-2@cinder-ceph | zo1  | enabled  | down  | 2024-10-04T15:01:26.000000 |
| cinder-scheduler | juju-ef54ec-1-lxd-2             | nova | enabled  | down  | 2024-10-23T10:52:26.000000 |
| cinder-scheduler | juju-ef54ec-2-lxd-2             | nova | enabled  | down  | 2024-10-23T10:52:26.000000 |
| cinder-backup    | juju-ef54ec-2-lxd-2             | nova | enabled  | down  | 2024-10-04T15:01:31.000000 |
| cinder-backup    | juju-ef54ec-1-lxd-2             | nova | enabled  | down  | 2024-10-04T15:01:31.000000 |
| cinder-volume    | juju-ef54ec-0-lxd-2@cinder-ceph | zo1  | enabled  | down  | 2024-10-04T15:01:42.000000 |
| cinder-scheduler | juju-ef54ec-0-lxd-2             | nova | enabled  | down  | 2024-10-23T10:52:19.000000 |
| cinder-backup    | juju-ef54ec-0-lxd-2             | nova | enabled  | down  | 2024-10-04T15:01:47.000000 |
| cinder-volume    | juju-ef54ec-2-lxd-2@LVM         | nova | disabled | down  | 2024-11-05T09:45:28.000000 |
| cinder-volume    | juju-ef54ec-1-lxd-2@LVM         | nova | disabled | down  | 2024-11-05T09:45:13.000000 |
| cinder-volume    | juju-ef54ec-0-lxd-2@LVM         | nova | disabled | down  | 2024-11-05T09:45:00.000000 |
+------------------+---------------------------------+------+----------+-------+----------------------------+
  • I did attempt to remove the cinder-ceph relations and then re-relate them again to see if it would help recover, but when removed, it triggered the deployment of Cinder LVM volumes
  • I couldn’t remove them so just set them to disabled after I add re-added the cinder-ceph relations.

I do not believe I did any other work on cinder/cinder-ceph itself, simply to add and remove cinder-backup charm.

Any help appreciated

juju status cinder

SAAS                             Status  Store        URL
alertmanager-karma-dashboard     active  juju-1  admin/cos.alertmanager-karma-dashboard
central-keystone-credentials     active  juju-1  admin/central-tibus-cloud.central-keystone-credentials
central-keystone-notifications   active  juju-1  admin/central-tibus-cloud.central-keystone-notifications
grafana-dashboards               active  juju-1  admin/cos.grafana-dashboards
keystone-admin-common            active  juju-1  admin/central-tibus-cloud.central-keystone-admin
keystone-common                  active  juju-1  admin/central-tibus-cloud.keystone
keystone-credentials-common      active  juju-1  admin/central-tibus-cloud.central-keystone-credentials
keystone-notifications-common    active  juju-1  admin/central-tibus-cloud.central-keystone-notifications
loki-logging                     active  juju-1  admin/cos.loki-logging
prometheus-metrics               active  juju-1  admin/cos.prometheus-metrics
prometheus-receive-remote-write  active  juju-1  admin/cos.prometheus-receive-remote-write
vault-common                     active  juju-1  admin/central-tibus-cloud.vault

App                  Version  Status  Scale  Charm         Channel           Rev  Exposed  Message
cinder               23.0.0   active      3  cinder        2024.1/candidate  690  no       Unit is ready
cinder-ceph          23.0.0   active      3  cinder-ceph   2024.1/candidate  533  no       Unit is ready
cinder-hacluster     2.1.2    active      3  hacluster     2.4/stable        131  no       Unit is ready and clustered
cinder-mysql-router  8.0.39   active      3  mysql-router  8.0/stable        200  no       Unit is ready

Unit                      Workload  Agent  Machine  Public address  Ports     Message
cinder/0                  active    idle   0/lxd/2  10.0.0.167  8776/tcp  Unit is ready
  cinder-ceph/4           active    idle            10.0.0.167            Unit is ready
  cinder-hacluster/1*     active    idle            10.0.0.167            Unit is ready and clustered
  cinder-mysql-router/1*  active    idle            10.0.0.167            Unit is ready
cinder/1                  active    idle   1/lxd/2  10.0.0.174  8776/tcp  Unit is ready
  cinder-ceph/5*          active    idle            10.0.0.174            Unit is ready
  cinder-hacluster/2      active    idle            10.0.0.174            Unit is ready and clustered
  cinder-mysql-router/2   active    idle            10.0.0.174            Unit is ready
cinder/2*                 active    idle   2/lxd/2  10.0.0.158  8776/tcp  Unit is ready
  cinder-ceph/3           active    idle            10.0.0.158            Unit is ready
  cinder-hacluster/0      active    idle            10.0.0.158            Unit is ready and clustered
  cinder-mysql-router/0   active    idle            10.0.0.158            Unit is ready

Machine  State    Address         Inst id              Base          AZ   Message
0        started  10.0.0.132  os-zo1-ctrl-2        ubuntu@22.04  zo1  Deployed
0/lxd/2  started  10.0.0.167  juju-ef54ec-0-lxd-2  ubuntu@22.04  zo1  Container started
1        started  10.0.0.133  os-zo1-ctrl-3        ubuntu@22.04  zo1  Deployed
1/lxd/2  started  10.0.0.174  juju-ef54ec-1-lxd-2  ubuntu@22.04  zo1  Container started
2        started  10.0.0.131  os-zo1-ctrl-1        ubuntu@22.04  zo1  Deployed
2/lxd/2  started  10.0.0.158  juju-ef54ec-2-lxd-2  ubuntu@22.04  zo1  Container started