No block devices detected using current configuration

mario-chirinos · 19 July 2023 19:14

We had a power cut on our data site last week, after managing to bring the cloud up again just one error remains, one osd has the message “No block devices detected using current configuration”.

I would like to know what is the best way to proceed here, I would proffer to remove that OSD and add it again as a new disk, instead of trying to make it to rejoin the cluster.
is it safe to use the following command?

juju run-action --wait $OSD_UNIT remove-disk osd-ids=$OSD purge=true

what other considerations should i take into account? I ran the following commands to provide the relevant information the output is attached I hope some can help me with this. Thanks in advance

juju status:

juju ssh ceph-mon/leader sudo ceph osd tree

juju debug-log --replay --no-tail -i ceph-osd/3

juju ssh ceph-mon/leader sudo ceph status

utkarshbhatthere · 21 July 2023 10:43

Hey @mario-chirinos. Can you also show the output of “juju config ceph-osd osd-devices” and the output of “lsblk” from the host which is showing this symptom ?

utkarshbhatthere · 21 July 2023 10:50

Also I can see that all the PGs are in good health (clean), and available (active) so it is safe to remove and add the DOWN osd disk.

mario-chirinos · 21 July 2023 15:33

@utkarshbhatthere thanks for the replay, the output is

geoint@MAAS-01:~$ juju config ceph-osd osd-devices
/dev/sdb /dev/sdc
geoint@MAAS-01:~$ juju ssh ceph-osd/3
ubuntu@key-ox:~$ sudo lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0                                                                                                   7:0    0  63.5M  1 loop /snap/core20/1950
loop1                                                                                                   7:1    0 173.5M  1 loop /snap/lxd/25112
loop2                                                                                                   7:2    0 173.5M  1 loop /snap/lxd/25086
loop3                                                                                                   7:3    0  53.3M  1 loop /snap/snapd/19457
loop4                                                                                                   7:4    0  53.3M  1 loop /snap/snapd/19361
loop5                                                                                                   7:5    0  73.9M  1 loop /snap/core22/817
loop6                                                                                                   7:6    0  63.5M  1 loop /snap/core20/1974
loop7                                                                                                   7:7    0  73.9M  1 loop /snap/core22/806
sda                                                                                                     8:0    0   223G  0 disk 
├─sda1                                                                                                  8:1    0   512M  0 part /boot/efi
└─sda2                                                                                                  8:2    0 222.5G  0 part /
sdb                                                                                                     8:16   0  21.4T  0 disk 
└─ceph--91050339--f650--49a6--ae51--7214be569906-osd--block--91050339--f650--49a6--ae51--7214be569906 253:0    0  21.4T  0 lvm

mario-chirinos · 21 July 2023 15:36

should i use ? juju run-action --wait $OSD_UNIT remove-disk osd-ids=$OSD purge=true

or what is the correct procedure for this?

lmlogiudice · 21 July 2023 18:24

that’s the correct procedure, yes. The purge flag will make it so that you can re-use the OSD id for future placements (If the command succeeds, the output will tell you how you can replace the OSD that you’re removing as well).

mario-chirinos · 21 July 2023 22:20

I got this error

geoint@MAAS-01:~$ juju run-action --wait ceph-osd/3 remove-disk osd-ids=osd.2 purge=true errors: 0: action “remove-disk” not defined on unit “ceph-osd/3”

lmlogiudice · 21 July 2023 22:50

Seems like you’re running a version of the ceph-osd charm that doesn’t have the remove-disk action. You’ll need to upgrade the charm to run the action.

mario-chirinos · 23 July 2023 03:55

is it safe to upgrade it?

should i use this command?

juju config ceph-os source=??

which source should I choose?, what about ceph-mon, should I also upgrade it?

mario-chirinos · 27 July 2023 00:14

I was able to remove the OSD following the instructions in : https://discourse.ubuntu.com/t/removing-osds-pre-quincy/27693 Now I would like to add again the disk in Unit 3 to the cluster, should a enter the unit 3 and do some manual work or can i just use a juju command?

geoint@MAAS-01:~$ juju ssh ceph-mon/leader sudo ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME               STATUS  REWEIGHT  PRI-AFF
 -1         183.32373  root default                                     
-21          21.83060      host calm-stag                               
  6    hdd   21.83060          osd.6               up   1.00000  1.00000
 -7          21.38379      host clean-hog                               
  4    hdd   21.38379          osd.4               up   1.00000  1.00000
-13          21.38379      host exotic-goblin                           
  5    hdd   21.38379          osd.5               up   1.00000  1.00000
 -5                 0      host key-ox                                  
 -9          21.38379      host liked-hermit                            
  1    hdd   21.38379          osd.1               up   1.00000  1.00000
-17          21.83060      host pumped-bat                              
  7    hdd   21.83060          osd.7               up   1.00000  1.00000
-15                 0      host sharp-grouse                            
-19          32.74359      host sharp-heron                             
  8    hdd   32.74359          osd.8               up   1.00000  1.00000
-11          21.38379      host stable-liger                            
  0    hdd   21.38379          osd.0               up   1.00000  1.00000
 -3          21.38379      host star-koala                              
  3    hdd   21.38379          osd.3               up   1.00000  1.00000

mario-chirinos · 7 August 2023 16:24

I was able to remove OSD.2 but inside the unit ceph-osd/2, from which I removed the osd, I still can see this device, is that normal?

Disk /dev/mapper/ceph--df9c0ef1--fbd7--46a0--bb7f--c54b44d2ed54-osd--block--df9c0ef1--fbd7--46a0--bb7f--c54b44d2ed54: 21.39 TiB, 23511720525824 bytes, 45921329152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

mario-chirinos · 22 September 2023 05:47

can some one help me with this?

woodoojuju · 17 March 2025 11:03

By following the steps below I was able to fix the error.

Added a new disk to VMs manually.
Removed /dev/sda from ceph-osd.yaml file.
Ran the action add-disk (juju run ceph-osd/6 add-disk osd-devices=/dev/sdb)