After a power outage one of our servers which contains an osd shows down status on juju, regardless of being turned on and listed as deployed in MAAS. I performed a test on the server with MAAS and now the server shows as ready instead of deployed.
I would like to remove this osd form the cluster, and I was wondering what is the proper way to do it.
I am attaching juju and ceph status
geoint@MAAS-01:~$ juju show-status-log ceph-osd/6
Time Type Status Message
24 Jul 2022 15:52:53-05:00 workload active Unit is ready (1 OSD)
geoint@MAAS-01:~$ juju ssh ceph-mon/0 sudo ceph status
cluster:
id: bf2cbfe8-9b3c-11ec-81ad-3fc481233260
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum juju-5025f7-0-lxd-0,juju-5025f7-1-lxd-0,juju-5025f7-2-lxd-0 (age 2d)
mgr: juju-5025f7-0-lxd-0(active, since 3d), standbys: juju-5025f7-2-lxd-0, juju-5025f7-1-lxd-0
osd: 9 osds: 8 up (since 2d), 8 in (since 3d)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
pools: 19 pools, 197 pgs
objects: 2.02M objects, 7.6 TiB
usage: 23 TiB used, 160 TiB / 183 TiB avail
pgs: 197 active+clean
io:
client: 60 KiB/s rd, 20 KiB/s wr, 60 op/s rd, 41 op/s wr
Connection to 10.2.101.140 closed.
geoint@MAAS-01:~$ juju run-action --wait ceph-mon/leader purge-osd osd=6 i-really-mean-it=yes
unit-ceph-mon-2:
UnitId: ceph-mon/2
id: "256"
message: OSD has weight 21.830596923828125, must have zero weight before this operation
results: {}
status: failed
timing:
completed: 2022-08-08 19:30:05 +0000 UTC
enqueued: 2022-08-08 19:30:04 +0000 UTC
started: 2022-08-08 19:30:04 +0000 UTC
I was able to remove the osd with the commands below, I am trying to remove the unit with juju remove-unit --wait ceph-osd/6, but it seams to do nothing, the unit is still there .