Ceph-osd removing a node

ronyzd · 5 November 2020 09:17

Hi everyone,
i am facig a ceph-osd issue on our Openstack infra. We upgraded from stein to train (ceph mimic to nautilaus) since then the ceph cluster is showing BlueFS spillover warning and performance went down a lot. Irealised that since the ceph cluster deployment we misconfigured the Bluefs db and wal sizes 1GB and 100 Mo). By reading different articles apparently this should be 30GB. The only clean way i came up with is to destroy each osd at time and recreate it.
This is 4 node cluster with each 12 disks (4 NVMe and 8 spinning). 1 NVMe is dedicated for journaling on each node. Ceph-osd and Nova-Compute are collocated.
Before starting the process and since this is a charmed deployment any of you guys have an experience doing that ?
if we run :

juju run-action --wait ceph-osd/0 osd-out
juju remove-unit ceph-osd/0
then
juju add-unit ceph-osd --to 12
Is this it ? or am i missing something ? will this affect nova-compute ?
I would greatly appreciate any feedback on this !
Thx

ronyzd · 17 November 2020 11:28

I eneded up doing this :

set node as out
wait for the cluster to rebalance
follow the ceph procedure on removing an osd
wait for the cluster to finish remapping pgs
from juju remove the unit
add the unit again with different ceph-osd configuration
zap the disks from juju cli: zap-disks
add the disks again : add-disks

i thought i ll share this experience
regards