The Future of Charmed Operators on Kubernetes

jnsgruk · 24 March 2021 08:25

The Future of Charmed Operators on Kubernetes

With the release of Juju v2.9 RC7, we’re previewing the future of Charmed Operators on Kubernetes by introducing sidecar charms that are more consistent with how workloads are managed across other Juju substrates.

Rationale and History

With the first generation of charms on K8s, Charmed Operators ran in their own Pods, and instructed Juju to provision the workload on the Kubernetes substrate through a mechanism called pod.set_spec. Workloads were then provisioned into their own Pods, separate from the Charm code. This approach had some inherent limitations:

Limited control over processes in the workload
No IPC or local communication between the Charmed Operator and the workload
No file or socket sharing
Inability to store per-unit state

Charmed Operators implementing this pattern are susceptible to more of the challenges associated with distributed computing, especially in those cases where Charmed Operators were not running on the same Kubernetes nodes as any or all of the workload Pods (due to the design of the Kubernetes scheduler).

Charms written in this way will continue to work going forward, but we strongly encourage developers to adopt the new sidecar-based approach, and help us make it the best way to operate workloads on Kubernetes or otherwise.

A Refined Approach

With the new approach, both the workload container and the charm run in the same Pod, implementing the Sidecar Pattern. By definition, the Sidecar Pattern is designed to allow the augmentation of workloads with additional capabilities and features - in this case the ability to effectively manage and operate complex workloads and yielding a number of advantages:

Charmed Operator and workload will always be scheduled on the same node
Charmed Operator and workload are co-located in same network namespace
Charmed Operator and workload can communicate with SHM or sockets
Files can be shared between Charmed Operator and workload more easily
Charmed Operator scales with the workload

Pebble

To augment this approach we’ve developed Pebble: a lightweight, API-driven process supervisor designed for use with modern Charmed Operators.

Pebble enables you to declaratively configure processes to run in a container, and control those processes throughout the workload lifecycle. It features a layering system that allows for coarse revisions to running configurations.

When writing a Charmed Operator that implements the Sidecar Pattern, no modifications are required to the base container images.

How It Works

Juju automatically injects Pebble into workload containers using an initContainer and Volume Mount. The entrypoint of the container is overridden so that Pebble occupies PID 1. Pebble is controlled by the Charmed Operator using a UNIX socket, which is mounted into both the Charmed Operator container, and the workload container. The Charmed Operator communicates over the socket with Pebble to manage running workloads.

Example Sidecar Charms

To help you get started implementing Charmed Operators using this new approach, you can follow the conversion process of some existing charms. This list will be kept up to date as we progress.

benhoyt/snappass-test - Concept Demonstration
mthaddon/charm-k8s-gunicorn - WIP
jnsgruk/hello-kubecon
martinrusev/grafana-operator - WIP

Getting Started

You can use any existing bootstrapped Kubernetes cluster, provided the controller is at least version 2.9. See the documentation for instructions on how to upgrade your controller. If you do not have a cluster ready, you can use MicroK8s!

$ sudo snap install --classic microk8s
$ sudo usermod -aG microk8s $(whoami)
$ sudo microk8s enable storage dns
$ sudo snap alias microk8s.kubectl kubectl
$ newgrp microk8s

Once you’ve done that, you should be able to invoke microk8s commands without using sudo. If you can’t, try logging out and logging back in before continuing.

Next, let’s install and bootstrap Juju, then deploy an example Charmed Operator:

# Make sure we have the correct version of Juju installed
$ sudo snap install juju --classic --channel=2.9/candidate
# Bootstrap a Juju controller on MicroK8s
$ juju bootstrap microk8s

# Create a model for our deployment
$ juju add-model snappass

# Deploy!
$ juju deploy snappass-test
# Wait for the deployment to complete
$ watch -n1 --color "juju status --color"

You can now inspect your deployment with kubectl:

kubectl -n snappass get pods
NAME                             READY   STATUS    RESTARTS   AGE
modeloperator-5cf796c689-2czcx   1/1     Running   4          3d18h
snappass-test-0                  3/3     Running   0          4m24s

Note that the snappass-test-0 pod indicates 3 running containers, in this case these are:

Charm container
Snappass container
Redis container

You can see the Pebble configuration for the snappass and redis containers in the Charm code. The Charm container was injected automatically by Juju.

Where to get help

You can find more full documentation about how to write charms implementing this new pattern in the Charmed Operator Framework Docs.

If you need help with your new Charm, write a post on the Charmhub Discourse, or reach out on the Charmhub Community Mattermost instance.

knkski · 24 March 2021 21:35

Thanks for the informative post. Can you clarify the rationale behind the new approach? You list these reasons:

Limited control over processes in the workload
No IPC or local communication between the Charmed Operator and the workload
No file or socket sharing
Inability to store per-unit state

This looks to me like a fundamental architectural difference of Pets vs. Cattle. In other words, Kubernetes very much tries to actively prevent you from doing each of those things, because it makes scaling out much easier when you can just terminate a malfunctioning pod and spin up a new one.

Kubernetes has become massively popular because this approach resonates with people. Existing Kubernetes users are likely to view these limitations as a good thing, because the cattle-based approach works well for them. They will likely view sidecar charms for Kubernetes as trying to fit a square peg into a round hole. What story do we have for convincing them that the pet-based approach is better?

jnsgruk · 25 March 2021 09:06

Hi Kenneth,

Thanks for the considered response - I think it’s worth mentioning that Charms implementing this new pattern do not have to interfere with Pods being killed/restarted/rescheduled, but rather have the option of doing so.

While the Pets vs. Cattle argument has many merits, and is applicable to lots of cases, there are situations where it is less beneficial. The traditional example here is very stateful workloads (such as nodes in a database cluster). If a process fails in such a deployment, it may not be beneficial to have the entire pod reschedules, and failure states might be handled more gracefully with the addition of some considered operations code in the Charm - without this new pattern, such recoveries are harder to achieve.

Clearly, if one is aiming for a more “immutable” deployment, then letting the scheduler do its thing is preferable, but I(/we) think this provides a nice alternative. There is an upcoming Files API for Pebble which will allow the push/pull/modifications in the workload container; for long-running stateful deployments, this may be beneficial to facilitate backup/restore or other maintenance/operations activities.

@jameinel and @manadart may have more to add here too!

Cheers!

knkski · 25 March 2021 15:35

When you say that it provides a nice alternative, do you see both sidecar charms and podspec-based charms being the path forward for K8s charming? Or is the plan to make sidecar charms work for immutable deployments as well?

jnsgruk · 25 March 2021 15:48

So in that context I meant an alternative to defaulting to the Kubernetes scheduler to make decisions about the workload where appropriate.

The plan is for this to be the de-facto method for deploying Charms on Kubernetes. What’s present in Juju 2.9 RC7 is an early preview. There will be features landing in later 2.9.x releases, and onward into the Juju 3.0 series to build upon the current capability

niemeyer · 25 March 2021 20:03

Hello Kenneth,

We’ve discussed this at length before and with better bandwidth, but we can certainly go over this again until these ideas are more clear.

Given these questions, there are apparently some misconceptions about what sidecars do in Kubernetes. So just to be clear:

The sidecar pattern does not disable the Kubernetes scheduler
The sidecar pattern does not remove immutability, unless the idea of what immutability means is stretched to fit some very particular personal notion of what that means.

These explanations from the Kubernetes communtiy about sidecars might bring some light here:

In the juju world, these are exactly the sorts of things we use Charms for. So it’s nice to see that upstream Kubernetes development and juju are aligned here. We like that very much, so this is the future of Kubernetes and juju.

I’m really hoping we can count on you for that as well.

knkski · 29 March 2021 19:40

@niemeyer: You can certainly count on me for this. It’s not really about me, though. I think the broader Kubernetes community will have these same questions about how immutable deployments are handled in Juju, and we should have a solid set of answers to these questions. With that in mind, here’s specifically what I’m wondering about:

The sidecar pattern does not remove immutability

When I mention immutability in my above posts, what I mean is that after deployment, Pods are not modified. If you want to change something about the deployment, you spin up new Pods with the desired configuration change, and terminate the old ones. This article has a good definition of the term, and what I have in mind when I use it.

I’d also like to clarify what I mean when talking about sidecars. In Kubernetes, sidecars are not opposed to immutable deployments. The key point is that if you want to change something about a sidecar, you similarly spin up an updated Pod with the new sidecar, and terminate the old Pod. The Sidecar Pattern mentioned above is different. It is using sidecars as a means to mutably update the main container in a Pod, which is inherently at odds with the idea of immutable deployments, as defined above.

Given these two clarifications, the question that I’m asking is: for people that like deploying their services in an immutable fashion, will that be something we support going forward? If not, what reasons can we list for why they should switch to the new style of deployment? This sort of documentation is important for people that are comfortable with their existing Kubernetes tooling, and are wondering why they should try out Juju.

niemeyer · 5 April 2021 09:32

Of course you can do that with juju. You can do that today, and will be able to continue to do that tomorrow. Nothing forces you to change anything. If you want to deploy a container and never touch it, just do it.

This is also very dogmatic, though, and as soon as you assign any kind of read-write storage to a pod you’re already breaking that rule, because the only reason to have read-write storage associated with a pod is if you want to make modifications to that environment at runtime. In fact, even if you don’t assign storage, but the software being deployed accepts dynamic changes via its API, you are also modifying that software.

This is the software deployment version of functional programming. Purity is beautiful, until you actually want side effects to do what in fact is essential. But juju is there for you in either case. If you don’t want to mutate your pod, just don’t.

gnuoy · 27 April 2021 06:48

I really enjoyed following along with these instructions, thank you. fyi I hit two small paper cuts; installing the charmcraft snap needed the –classic flag and currently the edge version of the charmcraft snap needs a couple of dependencies to be installed (probably only an issue on a freshly minted machine);

sudo apt install python3-pip python3-toml

jnsgruk · 27 April 2021 07:14

Hi @gnuoy,

Thanks! Yes there have been a couple of recent changes to Charmcraft’s packaging.

I’ll update these instructions now, and there is more content about how to get started with these charms in the new Charmed Operator Framework docs

Update: I forgot, snappass-test is now actually published on Charmhub, so I’ve updated to some much simpler instructions

knkski · 28 April 2021 17:16

The key issue is that with immutable infrastructure, you don’t deploy a container and never touch it. If you make any changes, you do it by spinning up a new Pod and terminating the old one, instead of modifying the Pod in-place. Will this model of infrastructure management still be available with sidecar charms?

There’s multiple types of state here to consider:

ephemeral
- State within the pod itself that will disappear if the pod gets terminated/rescheduled
- E.g. temp files, contents of memory
configuration
- Persistent state that determines how a service will run
- E.g. Environment variables, OCI image revisions, config files
data
- Persistent state that determines what the service will run with
- E.g. database files that are stored on read-write storage

As an example, look at a PostgreSQL Pod in Kubernetes. There’s configuration state such as a PGTZ environment variable. There’s data state such as the contents of /var/lib/pgsql/data. There’s also ephemeral state such as the process/threads sitting inside the Pod.

Immutable infrastructure is only concerned with configuration state. It does not break any rules to attach read-write storage to a Pod, because that is data state. If a Pod gets terminated for whatever reason, another Pod can be attached to the same read-write storage and happily continue serving the database contents, because the configuration state hasn’t changed.

On the other hand, software being deployed accepts dynamic [configuration] changes via its API does break that rule, and is very much discouraged in Kubernetes / immutable infrastructure. Those changes are stored in ephemeral state that will go away if the Pod does, and a new Pod won’t have that state.

Immutable infrastructure is actually a very pragmatic model, as it does not dictate how you handle data state. It only assures that you don’t accidentally cause yourself issues, by making sure that your configuration state is well-tracked.

jameinel · 29 April 2021 21:21

I don’t think a lot of what you’ve brought up is incompatible with sidecar charms. As the concept of what is configuration is still quite relevant for how the charm drives its workload container.
And what is data is still defined as storage and separated from ephemeral storage.

There are slightly different steps for how configuration is passed in but it is still modeling this with those pieces being disjoint and clearly identified.

There is certainly the difference that you might not kill off the entire container when you change configuration, but you could certainly argue that lots of software has operated in a world with SIGHUP and does a very good job with that model.

knkski · 29 April 2021 23:27

Specifically, I would define “compatible with sidecar charms” as sidecar charms supporting (for example) this workflow:

I (charm user) want to change an environment variable on a deployed charm
A new Pod is spun up with the updated environment variable
The old Pod is terminated
At no point was the old Pod’s environment variables mutated in-place

Is this workflow something that sidecar charms will support?

niemeyer · 5 May 2021 11:49

Kenneth, given the extensive conversations we’ve had here and before, I have to believe that either you already understand exactly how it works, or that nothing else we say here will make the details clearer for you.

Please feel free to reach out to me and @jnsgruk outside of this thread if indeed you’re still missing any relevant points in the conversation.

jj-quiet-ranger · 5 December 2021 11:39

When installing Juju, you now need to go for the stable version as 2.9 is now closed and there are no candidates. Presumably the candidates are for 3.0 from now on.