"Expanding Ceph Clusters with Juju"

@szeestraten has posted a really informative article, “Expanding Ceph Clusters with Juju”:

Scaling applications with Juju is easy and Ceph is no exception. You can deploy more Ceph OSD hosts with just a simple juju add-unit ceph-osd command. The challenging part is to add new OSDs without impacting client performance due to large amounts of backfilling.

Below is a brief walk through of the steps on how you can scale your Ceph cluster, with a small example cluster that you can deploy locally on LXD containers to follow along:

  • Set crush initial weight to 0
  • Add new OSDs to the cluster
  • Clear crush initial weight
  • Reweight new OSDs

Thanks for the mention @timClicks!

Note that while most commands should still work, the post is from earlier last year and Juju, Ceph and the Ceph charms have all had a share of updates since then so remember to test first.

We’re planning to expand a cluster with some more boxes soon so I’ll update the post if I find anything new.


Excellent write-up, @szeestraten.

As you noted, the products have improved since the writing.

Something we’ve been employing to reduce the impact of weighing in new disks is to follow the same advice to set your crush-initial-weight to 0, then once all units are added to juju and all osds are registered with the crush map, then we set the following variables, and then set the OSDs to their max weight (luminous and later versions of ceph have much better QoS than prior versions, and the gentle reweight scripts should probably be sunset due to excessive remapping of pgs each time you reweight disks).

            ceph tell osd.* injectargs '--osd-max-backfills 1'
            ceph tell osd.* injectargs '--osd-recovery-threads 1'
            ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
            ceph tell osd.* injectargs '--osd-client-op-priority 63'
            ceph tell osd.* injectargs '--osd-recovery-max-active 1'
            ceph tell osd.* injectargs '--osd-recovery-sleep 2'

These can be embedded into the ceph-osd charm’s config-flags, but may not be ideal for actual recovery from failure depending on your ceph environment and recovery time objectives.


Thanks for your comments @afreiberger!

As you say, things have improved a lot on the Ceph QoS side, so I’ll add a comment to point to your comment which is quicker/simpler way to go for newer versions.

We still have one older cluster with spinning rust which can be a bit sensitive so we usually take it a bit easy on that one.