Have had ceph up a running for a few years, deployed as part of focal-ussuri charmed OpenStack deployment
Recently had a single OSD fail
Followed the guide here for replacing a drive
missed this step juju run-action --wait $OSD_UNIT add-disk osd-devices=$OSD_DNAME
after inserting the new drive
Instead did:
juju run-action --wait ceph-osd/5 add-disk osd-devices=/dev/sdo
now 4 of 6 OSD units have the following status in juju status ceph-osd
Non-pristine devices detected, consult
list-disks,
zap-diskand
blackl
ist-* actions.
ceph is healthy and operating as expected, but juju seems to really want to perform an action that was already performed when the cluster was first deployed.
Any hints or pointers on how to rectify this situation would be much appreciated.
Here are some bits of information that seem relevant.
output of list-disks from one of the impacted units
juju run-action --wait ceph-osd/0 list-disks
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "831"
results:
Stderr: |2
Failed to find physical volume "/dev/sdh".
Failed to find physical volume "/dev/sdj".
Failed to find physical volume "/dev/sdc".
Failed to find physical volume "/dev/nvme0n1".
Failed to find physical volume "/dev/sdi".
Failed to find physical volume "/dev/sdk".
Failed to find physical volume "/dev/sdl".
Failed to find physical volume "/dev/sdg".
Failed to find physical volume "/dev/sda".
Failed to find physical volume "/dev/sdb".
Failed to find physical volume "/dev/sde".
Failed to find physical volume "/dev/sdd".
Failed to find physical volume "/dev/sdf".
blacklist: '[]'
disks: '[''/dev/sdh'', ''/dev/sdj'', ''/dev/sdc'', ''/dev/nvme0n1'', ''/dev/sdi'',
''/dev/sdk'', ''/dev/sdl'', ''/dev/sdg'', ''/dev/sda'', ''/dev/sdb'', ''/dev/sde'',
''/dev/sdd'', ''/dev/sdf'']'
non-pristine: '[''/dev/sdh'', ''/dev/sdj'', ''/dev/sdc'', ''/dev/nvme0n1'', ''/dev/sdi'',
''/dev/sdk'', ''/dev/sdl'', ''/dev/sdg'', ''/dev/sda'', ''/dev/sdb'', ''/dev/sde'',
''/dev/sdd'', ''/dev/sdf'']'
status: completed
timing:
completed: 2022-10-21 21:55:25 +0000 UTC
enqueued: 2022-10-21 21:55:21 +0000 UTC
started: 2022-10-21 21:55:21 +0000 UTC
charm information
ceph-osd 15.2.16 blocked 6 ceph-osd stable 304
output of juju ssh ceph-mon/leader sudo ceph -s
cluster:
id: <cluster uuid>
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 2d)
mgr: ceph-mon01(active, since 26h), standbys: ceph-mon02, ceph-mon03
osd: 73 osds: 72 up (since 21m), 72 in (since 23h);
rgw: 3 daemons active (juju-46ccdb-11-lxd-0, juju-46ccdb-12-lxd-0, juju-46ccdb-9-lxd-0)
task status:
data:
pools: 21 pools, 3097 pgs
objects: 10.17M objects, 39 TiB
usage: 118 TiB used, 277 TiB / 395 TiB avail
pgs: 3097 active+clean