I’m deploying Openstack Yoga using Charms to 4 identical servers. I have completed all of the steps to get the pieces installed and running, but ceph-osd is still blocked with ‘Non-pristine devices detected, consult list-disks, zap-disk and blacklist-* actions.’.
I can see that the disks exist, using ‘juju ssh ceph-osd/0’ and ‘fdisk -l’, and I’ve tried ’ juju run-action --wait ceph-osd/0 zap-disk devices=/dev/sdb i-really-mean-it=true’ on sdb and sdc on all 4 nodes.
Output of list-disks from one of the nodes is below.
Any ideas?
$ juju run-action --wait ceph-osd/0 list-disks
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "66"
results:
Stderr: |2
Failed to find physical volume "/dev/sdc".
Cannot use /dev/sda: device is partitioned
Cannot use /dev/sdd: device is too small (pv_min_size)
Failed to find physical volume "/dev/sdb".
blacklist: '[''/dev/sdd'', ''/dev/sda'']'
disks: '[''/dev/sdc'', ''/dev/sda'', ''/dev/sdd'', ''/dev/sdb'']'
non-pristine: '[''/dev/sda'', ''/dev/sdd'']'
status: completed
timing:
completed: 2022-09-26 18:11:17 +0000 UTC
enqueued: 2022-09-26 18:11:15 +0000 UTC
started: 2022-09-26 18:11:15 +0000 UTC
Indeed, it appears that you don’t have any usable disks for ceph-osd to run with. You mention zapping /dev/sdb, but it’s listed as non-existent. Are you certain that such device exists on the machine where that OSD resides ?
For some reason the charm code thinks the disks that are configured are non-pristine.
One of the main checks that the charm does, is that the first 2048 bytes of the disk are zero, e.g. the disk header is blank.
The most common cause of this issue is that the charm tried to setup the disk, failed part-way for some reason but after it wrote some data to the disk header such as an LVM header, LUKS header or similar. It’s also possible the disk header wasn’t blank to start with. MAAS zeros out each disks’ header at deploy time but depending on if you are using MAAS or exactly what else was done during the deploy it’s possible the header was never blank.
I’d just the following command to check each disk for the first 2048 bytes being 0:
xxd -l 2048 /dev/sdb
If they are not zero, first ensure 100% that the disk is not in-use and has no data and you are looking at the correct disk, then you can wipe it’s header with dd. Please be careful not to accidentally do this to the wrong disk/machine
Once you’ve cleaned up the disks you’ll need to trigger the config-changed hook of the charm, an easy way is to change the osd-disks config value to add a space on the end or similar or this may also work
juju run --application ceph-osd ./hooks/config-changed`
If this happened with a clean MAAS deployment, we may be able to figure out why it failed to setup the disk the first time if you upload a full copy of /var/log/juju/unit-ceph-osd-*.log from one of the affected units. I’d recommend checking this file for any sensitive data or secrets before uploading it publicly otherwise if you prefer to keep the file private you could e-mail to me direct (first.last@canonical.com).
It would also be good to check the output of sudo lsblk and not just fdisk as sometimes the disk is in-use by other parts of the storage subsystem that sometimes shows up in lsblk but not fdisk. So it’s a good command in that regard for checking.
Thanks for the info. I should have mentioned at the top that these are all managed by MAAS. All 4 servers are new, and this is a fresh installation, so there’s no risk of losing any data.
I only tried the xxd command on one of the nodes, but it shows all zeroes for both /dev/sdb and /dev/sdc.
So then I ran the ‘config-changed’ hook as you suggested. And… it worked. Ceph-osd found both drives on all 4 servers, and all services are active/idle now. Whatever was hanging it up may have been cleared by the zap-disk commands that I ran, but I didn’t know about the ‘config-changed’ hook. I had tried ‘juju run-action ceph-osd/0 add-disk osd-devices=/dev/sdb,/dev/sdc’, but that didn’t seem to help.
On to the next step. Based on a previous test a few months ago, it’ll probably all go well until I get to trying to integrate Keystone with our Active Directory.
I think you are right the zap-disk commands probably removed whatever was on the disks from the first failure.
If you’re happy to share /var/log/juju/unit-ceph-osd-.log and /var/log/syslog from one of the ceph-osd units via e-mail I’m happy to see if we can figure out why it failed the first time. There may be a configuration change that will prevent it doing that if you try to re-deploy again.
Hi
I have 3 physical servers, 1 for running MAAS, 2 other larger servers deployed as KVM hosts
Juju controller in installed on VM on one of the KVMhosts
Now trying to deploy juju charmed ceph, want to use 1 vm on 1 physical host with a virtual disk, and the 2 physical servers with raw disks. Plan to add a 3rd physical server later.
Created a vm on one of the KVM hosts and did a juju add-machine
:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
ceph maas2-default maas2/default 2.9.37 unsupported 10:36:42-05:00
Machine State Address Inst id Series AZ Message
0 started 172.18.20.203 vm-45-ceph-1 focal default Deployed
~$ more ceph.yaml
ceph-osd:
osd-devices: /dev/sdc /dev/vdb
source: cloud:focal-fossa
I was getting some errors like
“‘xenial’ is not a valid distro_series. It should be one of: ‘’, ‘ubuntu/focal’.”]})
so I added the source: cloud:focal-fossa but not sure if that is correct
ceph-osd does get deployed to the VM but not the physical servers, I get constraints errors. I could create addition VM but would prefer to use the baremetal physical servers with raw disks dedicated to ceph. Is this not possible because the physical server are in a deployed state, not a ready state or is it because I am not specifying the correct contraints? I have a disk on each physical server /dev/sdc ready to use. I don’t see a /var/log/juju/unit-ceph-osd-*.log
Did you do a add-machine for your physical servers?
Been running into errors with a maas deployed openstack. currently my ceph-osd wont deploy properly. Would like some help. I have tried the stuff above and no luck.