Charmed Ceph docs for replacing OSD and WAL/DB device

Hi OpenStack Charmers,

I found the docs for replacing disks https://ubuntu.com/ceph/docs/replacing-osd-disks, however the example case is somewhat over-complicated as it uses bcache rather than regular BlueStore OSDs and/or WAL/DB devices.

Are there any docs for the following cases somewhere? If not, could you please add them?

  • Procedure for replacing BlueStore OSD disk backed with WAL/DB device
  • Procedure for replacing WAL/DB device and all that that entails

Hi Sandor. Thanks for this feedback. No, we don’t have any operator documentation on WAL or WAL/DB devices. I’ll put it on the list.

1 Like

@pmatulis great, thank you.

I also have a correction and suggestions for the current Replacing OSD disks | Ubuntu setup.

Getting OSD FSID

The sudo ceph-volume lvm list | grep -A 12 "= osd.4 =" | grep 'osd fsid' does not work Ceph 16.2.7.

ubuntu@hcc-store40:~$ sudo ceph-volume lvm list | grep -A 12 "= osd.4 =" | grep 'osd fsid'

ubuntu@hcc-store40:~$ sudo ceph-volume lvm list | grep -A 12 "= osd.1 ="
====== osd.1 =======

  [block]       /dev/ceph-6f7f2d54-f2d0-4c20-82c4-27c009d3d651/osd-block-6f7f2d54-f2d0-4c20-82c4-27c009d3d651

      block device              /dev/ceph-6f7f2d54-f2d0-4c20-82c4-27c009d3d651/osd-block-6f7f2d54-f2d0-4c20-82c4-27c009d3d651
      block uuid                ZjSo1W-CItM-pUpB-S9Kc-rndf-0yY7-iSB6nb
      cephx lockbox secret      AQCQZWZiYSfFMhAAmog8ySZLrc1H1uGDOE1DMw==
      cluster fsid              cb9da1fc-c170-11ec-90b7-21377ec761fd
      cluster name              ceph
      crush device class        None
      db device                 /dev/ceph-db-41cc6a7d-03e7-4bf9-8630-2c18ed491cc7/osd-db-6f7f2d54-f2d0-4c20-82c4-27c009d3d651
      db uuid                   PARz39-hP20-dogJ-hS3W-1VYg-08ov-efNLEh
      encrypted                 1

ubuntu@hcc-store40:~$ sudo ceph-volume lvm list | grep -A 14 "= osd.1 ="
====== osd.1 =======

  [block]       /dev/ceph-6f7f2d54-f2d0-4c20-82c4-27c009d3d651/osd-block-6f7f2d54-f2d0-4c20-82c4-27c009d3d651

      block device              /dev/ceph-6f7f2d54-f2d0-4c20-82c4-27c009d3d651/osd-block-6f7f2d54-f2d0-4c20-82c4-27c009d3d651
      block uuid                ZjSo1W-CItM-pUpB-S9Kc-rndf-0yY7-iSB6nb
      cephx lockbox secret      AQCQZWZiYSfFMhAAmog8ySZLrc1H1uGDOE1DMw==
      cluster fsid              cb9da1fc-c170-11ec-90b7-21377ec761fd
      cluster name              ceph
      crush device class        None
      db device                 /dev/ceph-db-41cc6a7d-03e7-4bf9-8630-2c18ed491cc7/osd-db-6f7f2d54-f2d0-4c20-82c4-27c009d3d651
      db uuid                   PARz39-hP20-dogJ-hS3W-1VYg-08ov-efNLEh
      encrypted                 1
      osd fsid                  6f7f2d54-f2d0-4c20-82c4-27c009d3d651
      osd id                    1

ubuntu@hcc-store40:~$ sudo ceph-volume lvm list | grep -A 14 "= osd.1 =" | grep 'osd fsid'
      osd fsid                  6f7f2d54-f2d0-4c20-82c4-27c009d3d651

Other alternatives might be:

  • ceph osd find 1
  • ceph osd find 1 | jq .osd_fsid
  • ceph-volume lvm list /dev/sda | grep 'osd fsid'
  • ceph-volume lvm list --format json | jq '."1"[0].tags."ceph.osd_fsid"'

Cleaning up OSD resources

Instead of running cryptsetup close and vgremove manually, use

  • ceph-volume lvm deactivate [osd_id] Deactivate unmounts and OSDs tmpfs and closes any crypt devices
  • ceph-volume lvm zap --destroy --osd-id [osd_id] This will remove relevant LVs/VGs (including DB which is not handled currently in docs). This can also take VG/LV or paths if that’s more relevant for the docs.

N.B. One still has to run pvremove in order to remove LVM label as workaround for Bug #1967295 “zap-disk action fails to zap device part of LVM wh...” : Bugs : Ceph OSD Charm

Safety checks

I would also recommend to add some safety checks such as ceph osd ok-to-stop and ceph osd safe-to-destroy as long as the guide does not use the remove-disk action which has these checks.