We had a power cut on our data site last week, after managing to bring the cloud up again just one error remains, one osd has the message “No block devices detected using current configuration”.
I would like to know what is the best way to proceed here, I would proffer to remove that OSD and add it again as a new disk, instead of trying to make it to rejoin the cluster.
is it safe to use the following command?
juju run-action --wait $OSD_UNIT remove-disk osd-ids=$OSD purge=true
what other considerations should i take into account?
I ran the following commands to provide the relevant information
the output is attached
I hope some can help me with this.
Thanks in advance
juju status:
juju ssh ceph-mon/leader sudo ceph osd tree
juju debug-log --replay --no-tail -i ceph-osd/3
juju ssh ceph-mon/leader sudo ceph status
juju ssh ceph-mon/leader sudo ceph status
Hey @mario-chirinos. Can you also show the output of “juju config ceph-osd osd-devices” and the output of “lsblk” from the host which is showing this symptom ?
Also I can see that all the PGs are in good health (clean), and available (active) so it is safe to remove and add the DOWN osd disk.
@utkarshbhatthere thanks for the replay,
the output is
geoint@MAAS-01:~$ juju config ceph-osd osd-devices
/dev/sdb /dev/sdc
geoint@MAAS-01:~$ juju ssh ceph-osd/3
ubuntu@key-ox:~$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 63.5M 1 loop /snap/core20/1950
loop1 7:1 0 173.5M 1 loop /snap/lxd/25112
loop2 7:2 0 173.5M 1 loop /snap/lxd/25086
loop3 7:3 0 53.3M 1 loop /snap/snapd/19457
loop4 7:4 0 53.3M 1 loop /snap/snapd/19361
loop5 7:5 0 73.9M 1 loop /snap/core22/817
loop6 7:6 0 63.5M 1 loop /snap/core20/1974
loop7 7:7 0 73.9M 1 loop /snap/core22/806
sda 8:0 0 223G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
└─sda2 8:2 0 222.5G 0 part /
sdb 8:16 0 21.4T 0 disk
└─ceph--91050339--f650--49a6--ae51--7214be569906-osd--block--91050339--f650--49a6--ae51--7214be569906 253:0 0 21.4T 0 lvm
should i use ?
juju run-action --wait $OSD_UNIT remove-disk osd-ids=$OSD purge=true
or what is the correct procedure for this?
that’s the correct procedure, yes. The purge
flag will make it so that you can re-use the OSD id for future placements (If the command succeeds, the output will tell you how you can replace the OSD that you’re removing as well).
Seems like you’re running a version of the ceph-osd charm that doesn’t have the remove-disk
action. You’ll need to upgrade the charm to run the action.
is it safe to upgrade it?
should i use this command?
juju config ceph-os source=??
which source should I choose?, what about ceph-mon, should I also upgrade it?
I was able to remove the OSD following the instructions in : https://discourse.ubuntu.com/t/removing-osds-pre-quincy/27693
Now I would like to add again the disk in Unit 3 to the cluster, should a enter the unit 3 and do some manual work or can i just use a juju command?
geoint@MAAS-01:~$ juju ssh ceph-mon/leader sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 183.32373 root default
-21 21.83060 host calm-stag
6 hdd 21.83060 osd.6 up 1.00000 1.00000
-7 21.38379 host clean-hog
4 hdd 21.38379 osd.4 up 1.00000 1.00000
-13 21.38379 host exotic-goblin
5 hdd 21.38379 osd.5 up 1.00000 1.00000
-5 0 host key-ox
-9 21.38379 host liked-hermit
1 hdd 21.38379 osd.1 up 1.00000 1.00000
-17 21.83060 host pumped-bat
7 hdd 21.83060 osd.7 up 1.00000 1.00000
-15 0 host sharp-grouse
-19 32.74359 host sharp-heron
8 hdd 32.74359 osd.8 up 1.00000 1.00000
-11 21.38379 host stable-liger
0 hdd 21.38379 osd.0 up 1.00000 1.00000
-3 21.38379 host star-koala
3 hdd 21.38379 osd.3 up 1.00000 1.00000
I was able to remove OSD.2 but inside the unit ceph-osd/2, from which I removed the osd, I still can see this device, is that normal?
Disk /dev/mapper/ceph--df9c0ef1--fbd7--46a0--bb7f--c54b44d2ed54-osd--block--df9c0ef1--fbd7--46a0--bb7f--c54b44d2ed54: 21.39 TiB, 23511720525824 bytes, 45921329152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
can some one help me with this?