I’m trying to use juju storage to provision a block device for the ceph-osd charm on juju 3.3.3/3.4.0, against MAAS 3.4 using an LXD Pod to hold VMs for all machines in the MAAS. This is in my home lab, trying to test functionality around attaching storage in different configurations (e.g. as bcache, an LVM PV, raw disk, etc) as part of adding juju storage support to the microceph charm. Although I am currently testing with the existing ‘ceph-osd’ charm, trying to see how it behaves with that.
If I pre-define a VM in MAAS that is then used by the model, it fails to see the storage on the machine (which is there). It’s stuck pending the storage to appear, the charm never starts installing as a result and the machine status keeps showing “agent initialising”.
However if I let juju/MAAS auto-compose a seemingly identical VM with the same bundle, the storage is seen and gets attached. Looking at the MAAS and LXD config, it looks the same, the disks are the same size, etc. In my example I’m attaching two different disks but it also fails if I try only 1 of them, and I’ve also tried both.
I have not tried this with “real” hardware machines at this stage due to lack of such resources. Traditionally in actual field deployments, the juju storage functionality is not usually used here, and raw disk devices are provisioned in MAAS and passed directly to the osd-devices config item. So this functionality in MAAS is perhaps not often excercised compared to the openstack provider etc where it’s more commonly used. I don’t have experience using it in the past against MAAS to know if it worked at some point on an older juju version, etc. Trying to use juju 2.9 gives me other problems around MAAS/LXD Pods that make it non-trivial to test that.
I’ve crawled over the logs and made a very poor attempt to reckon with the code that handles the storage attachment, and have not managed to figure out what the difference is and why it sees the disk in one case and not the other. It’s attached and alive in “juju storage --format json” in a way that doesn’t seem to make it to the machine.
I’ve created a juju-crashdump once it hits steady-state with logging-config=“=TRACE;unit=TRACE”. You can find it here: https://drive.google.com/file/d/1amPePTqV8Ea6blVtFP2SeizHwlJ76hw-/view?usp=drive_link
ceph-osd/{0,1,2} on hostname ceph1 (machine 1)/ceph2 (machine 2)/ceph3 (machine 3) are pre-created - you can see they are stuck in “agent initialising”
ceph-osd/3 on hostname merry-mullet (Machine 4) is the auto-composed machine - you can see the charm deployed and used the storage here
You can compare {1,2,3}/baremetal/var/log/juju/.log with 4/baremetal/var/log/juju/.log
On the broken unit the unit-ceph-osd-3.log prints still pending [storage-bluestore-db-4 storage-osd-devices-5]
On the working unit we see got storage change for ceph-osd/3: [bluestore-db/6 osd-devices/7] ok=true
There are no super obvious lines to me from machine-N.log about identifying the storage, other than when it gets the info about the attachment being “alive” for the ceph-osd units storage-attached hook.
We can see the broken units keep repeating udevadm calls, it does query the relevant disks (/dev/sdb and /dev/sdc) every 30 seconds but then says no changes to block devices detected
and doesn’t do anything.
I am composing the MAAS machines with this command:
for i in 1 2 3; do maas admin vm-host compose 1 hostname=ceph${i} cores=4 memory=4096 storage="0:24,1:32,2:8"; done
The bundle I’m deploying is like so:
series: jammy
applications:
ceph-mon:
charm: ch:ceph-mon
channel: quincy/edge
series: jammy
num_units: 1
constraints: mem=2G
options:
monitor-count: 1
expected-osd-count: 3
ceph-osd:
charm: ch:ceph-osd
channel: quincy/edge
series: jammy
num_units: 4
constraints: mem=4G root-disk=16G
options:
osd-devices: '' # must be empty string when using juju storage
bluestore-block-db-size: 1900000000
storage:
osd-devices: maas,32GB,1
bluestore-db: maas,8GB,1
relations:
- [ ceph-mon, ceph-osd ]
juju status
lathiat@zlab:~/src/stsstack-bundles/ceph$ juju status
Model Controller Cloud/Region Version SLA Timestamp
quincy maas maas/default 3.4.0 unsupported 09:36:47Z
App Version Status Scale Charm Channel Rev Exposed Message
ceph-mon 17.2.6 waiting 1 ceph-mon quincy/stable 201 no Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (3)
ceph-osd 17.2.6 waiting 1/4 ceph-osd quincy/stable 576 no agent initialising
Unit Workload Agent Machine Public address Ports Message
ceph-mon/0* waiting idle 0 172.16.0.45 Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (3)
ceph-osd/0* waiting allocating 1 172.16.0.39 agent initialising
ceph-osd/1 waiting allocating 2 172.16.0.75 agent initialising
ceph-osd/2 waiting allocating 3 172.16.0.55 agent initialising
ceph-osd/3 active idle 4 172.16.0.53 Unit is ready (1 OSD)
Machine State Address Inst id Base AZ Message
0 started 172.16.0.45 hardy-eagle ubuntu@22.04 default Deployed
1 started 172.16.0.39 ceph1 ubuntu@22.04 default Deployed
2 started 172.16.0.75 ceph2 ubuntu@22.04 default Deployed
3 started 172.16.0.55 ceph3 ubuntu@22.04 default Deployed
4 started 172.16.0.53 merry-mullet ubuntu@22.04 default Deployed
juju storage
lathiat@zlab:~/src/stsstack-bundles/ceph$ juju storage
Unit Storage ID Type Pool Size Status Message
ceph-osd/0 bluestore-db/0 block maas 7.5 GiB attached
ceph-osd/0 osd-devices/1 block maas 30 GiB attached
ceph-osd/1 bluestore-db/2 block maas 7.5 GiB attached
ceph-osd/1 osd-devices/3 block maas 30 GiB attached
ceph-osd/2 bluestore-db/4 block maas 7.5 GiB attached
ceph-osd/2 osd-devices/5 block maas 30 GiB attached
ceph-osd/3 bluestore-db/6 block maas 7.5 GiB attached
ceph-osd/3 osd-devices/7 block maas 30 GiB attached