High availability for Juju controller on k8s cloud?

Hi guys, I’m new to Juju and my experiences with Juju as the operator life-cycle manager on k8s have been going great, but the one thing that prevents it to be integrated in our production environment is that the controller itself is not highly-available yet on k8s clouds, which is unfortunate because we don’t want such an important piece of software to be a weak point of our whole system.

I was reading elsewhere on the Launchpad bug tracker that said the feature is not being developed at the moment and we should use LXD for that purpose. This is a bummer since k8s is the world-leading container orchestration platform and such a critical feature should be presented as soon as possible, if not already.

Therefore, I’m opening this thread to gather ideas and pointers from you guys for supporting HA mode on k8s for Juju controllers. My first instinct is that scaling the controller StatefulSet to 3 replicas is not too hard, right?

The bug mentioning this, fwiw is Bug #1869211 “HA Juju controller on Kubernetes” : Bugs : juju. Perhaps someone from the Juju team can confirm if/when that might make it back onto the active work queue.

Just to make sure it’s clear, as it might not be from the wording in that bug report, you can deploy an HA controller on another substrate (e.g. LXD if you’re running locally, or OpenStack, MAAS or a public cloud provider) and then add your k8s cluster to that controller, which allows you to add k8s models and deploy the same as if you were running the controller itself on k8s. This means you can have an HA controller and deploy into k8s.

If this is something that’s an option for you, let us know. The documentation currently seems to be more focused on the case of adding the controller directly on k8s, but we can talk you through it here and look at updating the documentation as well.

Perhaps someone from the Juju team can confirm if/when that might make it back onto the active work queue

I’d love to join any discussions on that matter.

you can deploy an HA controller on another substrate (e.g. LXD if you’re running locally, or OpenStack, MAAS or a public cloud provider) and then add your k8s cluster to that controller

Hmm, from what I saw on this post, those substrates didn’t support adding k8s clouds, so I just assumed that such operation is not possible.

Even if we could add a k8s cloud to an LXD/OpenStack/MAAS Juju controller, I still think bootstrapping HA controllers on K8s is a better solution for three reasons:

  1. Enabling HA on LXD, OpenStack or MAAS requires at least three more nodes for the juju controllers, thus demanding more resources. We could fix that by bootstrapping Juju on the same k8s cloud hosting our actual applications
  2. If Juju is running on a different network than the cloud, we’d have to configure the underlying network routes properly beforehand. We (might?) also have to create a loadbalancer (which requires more resources) for the controllers. For k8s controllers running on the same k8s cloud, simply using ClusterIP gives us fast and reliable connections to the controllers without additional configurations and resources
  3. Maintaining Juju controllers and Juju applications separately on different infrastructures requires more effort and is overall cumbersome (in my opinion)

I don’t think there is any doubt that we want to get to a place where you can have HA controllers on Kubernetes. The key bit of prioritization right now is whether it is worth dealing with the issues around that before we deal with other issues. We are trying to focus right now on making sure writing charms is a great experience, and that you can express your interactions with Kubernetes in a great way as a charmer, so that you have a good body of Charms that you can deploy on your controller.

Just scaling the statefulset is not hard, but it presents a number of implications for the database, etc to remain in sync. And solving the replication configuration of Mongo and getting the right networking information in place. As well as storage and handling when a pod gets rescheduled. (Some of the Service abstractions help, some of them get in the way.)

@wallyworld has spent a bit more time focused on how we might get there than I have, so he may have a few more explicit points to raise. A key point, though, is that we do feel it is important, it is just slightly less important than what we are currently working on.

John’s given a good summary of the issues. The work is certainly doable but non-trivial to get right. It’s not yet been considered more important that competing priorities due to the fact that if the controller pod does go down, k8s will schedule a replacement and any downtime will (almost always) be a hiccup rather than an outage, unlike on machine clouds where a single controller node going down does result in a broken Juju. So the question becomes, does the effort in doing HA controllers (right now) pay its way or is it more to tick the “HA” box on a spec sheet.

This is a valid point, plugging a new pod to the same persistent volume would only cost us a few seconds, whereas an HA cluster of Juju controllers might also need some time to get to the new stable state.

The real value of HA controllers lies in the replication of persistent data. If one physical volume is broken, we still have multiple replications of it as backups, thus reduce the risk of losing important data. I still think enabling HA on k8s is worth the effort and isn’t just to complete the spec sheet.