This post is about private-cloud deployments in the first instance. Vendorclouds are viewed as burst-scale-out and backup strategies in what I’m working on.
I’m tasked with delivering an elegant way to manage dynamic workloads across a medium-sized cluster. The unique characteristics of this cluster are that there are several ‘pet’ nodes with specific purposes, however they’re utilized for this purpose infrequently, and on a set schedule. When they’re idling (more than 70% of an average week), each of these nodes provides as much compute capacity as 8-16 of the regular cluster nodes. Due to other dependencies and characteristics of these nodes, VM usage needs to be minimized (preferably not used at all).
An additional goal is to deliver a ‘DCIB’ script (data-center-in-a-box) which is something the folks I’m helping out are used to working with as a project framework.
After spending some time testing out implementations with tools I’m already familiar with (Salt, Terraform, etc) I revisited Juju for the first time since I first read about it. With the most recent implementation I found what I want to do with the following high-level implementation:
-
Juju install to initial control plane. From there bootstrap a local lxd controller, and spin up containers for MAAS region and rack. Ideally I’d use Juju to deploy maas-rack and maas-region, but their snaps are currently broken, and my attempts to build an apt-install MAAS charm were met with some unexpected (and expected) complexity.
-
Netboot all remaining nodes via MAAS and make necessary deployment changes to provide a standard range of common interfaces and storage across all nodes, tagging the pets according to their additional traits.
-
Juju bootstrap MAAS controller. In dev I have been adding the controller to a local VM, then migrating it into the cluster with enable-ha and spinning down the VM machine.
-
Juju deploy LXD-Cluster charm to all MAAS nodes. Most cluster-level workloads will deploy here, or in the…
-
Juju bootstrap LXD-Cluster, Juju deploy CDK cluster to LXD cluster. This part doesn’t work yet. Conjure-up is the recommended way, but it’s currently hamstrung by recent lxd-profile changes within Juju(I believe). I have attempted to manually build my own CDK bundle, and add in the lxd-profile.yaml needed to have Juju deploy CDK to LXD with the necessary profile tweaks CDK needs to run inside containers.
-
Finally, bootstrap the CDK cluster.
Storage for the entire setup will be ceph-backed. The Ceph cluster gets deployed to the MAAS controller, with OSDs to bare metals and MONs (and FS) to containers.
Like I said, this doesn’t work yet. Some of the features I’m using here leverage freshly shipped or beta code in Juju, MAAS, and LXD. But, once it does work, a very small script should be able to login to an initial box, then bootstrap the entire cluster, whilst providing instrumentation and tooling at each level to manage optimal deployment paths, that an operator can easily throw workloads around and scale dynamically, either via scripting or manually.
But why?
This gives flexibility to add services, via Juju, to a generally static set of hardware whilst deploying either to bare metal, containers, or k8s with a terrific set of management tools for every layer (MAAS for provisioning HW, LXD clustering to assist with backup chores and host migration, and CDK for deployments using process-level isolation or other k8s benefits beyond what’s best case for LXD). All one does is target a specific controller:model from a single prompt to work on the three major entrypoints that require regular maintenance to provide the most efficient compute capacity.
As mentioned at the beginning, this can and will expand further to bootstrapping a vendorcloud (I’m testing with Azure) for scaling-out and using CMR to hold the ship together.
The current implementation I’m looking to replace is using Docker Swarm and a lot of nasty race-condition-ridden docker stack deploy scripts which I’m looking to eliminate.
Anyway, this is a heads up on why I’m loitering around here and on IRC, being a general nuisance. Enjoy the rest of your holidays and see you in 2019!
*I wanted to title this topic ‘What’s he Building In There?’ after the Tom Waits song.