Juju 4.0 Architecture

As work on Juju 4.0 (sans MongoDB) continues apace, we’ve been trickling pieces of our emerging relational data model into this post.

We’ll now look at some architectural aspects of how we’re supplying access to this data model via services.

As a starting point I’ll explain the following arrangement.

Controller charm

In major versions 2 and 3 of Juju, we allowed the isolation of traffic across the MongoDB cluster by setting a space name in the juju-ha-space controller configuration article.

This will be dropped in 4.0 in favour of using a binding for the controller charm’s dbcluster peer relation. In this way we remove an item of esoteria and replace it with a well-known concept from the Juju ecosystem.

The controller charm maintains a configuration file on disk similar to agent configuration. Its contents include the desired cluster topology.

Database accessor

The dbWorker is responsible for starting the local Dqlite node and negotiating its membership in the cluster. Each controller is a Dqlite node. Nodes are joined to the cluster based on the configuration file written by the controller charm.

It also maintains a set of trackedDB child workers, one for each database. Juju has a main controller database, plus one for each model. These workers export to other workers, the ability to acquire a transaction runner. Each one monitors the health of its database connection and implements a retry strategy for transactions submitted to the database.

Change stream

Juju’s operation is heavily predicated on watchers. Watchers allow agents to watch some aspect of the data model and receive a notification if it changes. For example, the Juju provisioner watches for new machine records and sets about provisioning them when it is notified.

In Juju 3.x versions we use a feature of MongoDB called aggregation pipelines to create a stream of data changes. This stream is multiplexed to individual watchers based on the particular events that they subscribe to.

Dqlite does not possess such a mechanism, so the Juju team designed a trigger-based change log, which is read in a loop by the changeStreamWorker. This worker dispatches terms to another worker that manages subscriptions from watchers.

By wrapping the transaction runner ability supplied by the trackedDB workers in another worker that allows plugging in to an event stream, we can export the ability to both run transactions and create watchers.

Domain services

The ultimate consumers of runner/watcher capability are the main Juju domain services. This is where we implement all of our application logic. These services are recruited by the API server in order to serve clients and agents, and by other workers in the controller database.

Simple… not so fast

In the next post we’ll go into more detail on how we provide access to the cloud substrates (providers) from the domain services, which will shed some light on the extent to which we’ve abbreviated the diagram above.

11 Likes

In the prior post I illustrated (from a mile high) how we provide database access to our domain services. However, Juju is no simple OLTP system. In addition to data for representing the system state, we need access to:

  • Object storage for charm BLOBs and agent binaries.
  • The cloud that Juju provisions resources on, also known as a provider.

With MongoDB, Juju takes advantage of GridFS to store BLOBs. This is quite convenient, because Mongo replicates stored objects across the cluster without any intervention from us, meaning every stored BLOB becomes accessible by any controller.

Without GridFS to rely on, we had to write our own object storage facility. By default, Juju will use the controller’s file-system for BLOBs, but it can now be configured to use an S3-compatible back-end such as MinIO. Credit goes to @simonrichardson for the heavy lifting on this.

One of the aspects making the data back-end change such a huge undertaking is the fact that our legacy state package accumulated over time, mechanisms for accessing object storage and providers. In order to swap out our data back-end, we also needed to rework access to these other concerns.

We did this by extending the idea that we used for domain services.

The left-hand side of this diagram shows the dependency from the original post: database → change stream → domain services. The rest is our new dependency arrangement for object storage and providers.

Separation of concerns

To instantiate an object store or provider requires access to stored state - we need to know model configuration, cloud credentials etc. We also need watchers that can let us take action if this state changes.

You can see that we’ve supplied this access by creating new service packages dedicated to those concerns. This was chosen over the following alternatives:

  • Giving the object store and provider workers direct access to a transaction runner.
  • Having domain services include logic required for instantiating object store and providers.

All of our service packages possess the following characteristic: the data access components are separated from other service logic and only responsible for running transactions in the database. This means that engineers programming services can never fall victim to a class of foot-guns that include making a long-running blocking call to a provider inside a database transaction.

Engineering velocity

What is emerging here is exciting to the Juju team. Where before we had a lot of ad-hoc instantiation of object stores and providers, even using different patterns in different places, we have made it so the engineer of Juju features doesn’t need to be concerned with that level of detail at all.

The bottom left corner shows the culmination of our dependency composition. Database state, watcher, object storage and cloud provider are all supplied to the domain services package for easy use in implementing application logic.

This will pay big dividends for the project into the future.

5 Likes

Domain services. Nice idea, but…

I described in the prior post how we set up a dependency chain under Juju’s domain services to provide developers with a kind of internal API.

But there are situations where we need to affect changes before we can satisfy the dependencies for a rich domain services offering. These include:

  • Bootstrap, where we need to provision a controller from scratch.
  • Model creation, where we need to create cloud resources before activating the model.
  • Model migration, where we need to verify operation of a model before commiting to its presence on a new controller.

In this post we will concern ourselves with the bootstrap phase.

Bootstrap

To create and start a Juju controller, we need to:

  • Validate supplied credentials, configuration and constraints.
  • Access the cloud with the relevant credential to provision the initial controller machine/pod/container and other resources.
  • Set up and start the controller there.
  • Record the initial state in our database.

The concern with bootstrapping is that we need logic in the client - there’s no controller yet to run it on! This inflates the size of the client binary, not only with dependencies, but with all the logic particular to this out-of-band cloud operation.

In versions of Juju prior to 4.0, the bootstrap logic has grown over time to encompass initialisation generally, not just what we need to do to get a controller running. Things that we do as part of the bootstrap process include:

  • Discovery of network configuration from the cloud.
  • Creation of an initial model.
  • Addition of the initial user and keys.
  • Deployment of the controller charm.
  • Population of agent binaries in object storage.

This goes against the grain of what we’ve done with domain services. All of these activities need to occur as part of normal Juju operation, so they exist in domain service logic, but if run during bootstrap we need a special case for each of them.

The solution is to delay all logic that is not absolutely necessary to start a controller, until after it starts. But how?

Dependency engine to the rescue

The answer is, with another worker. We can see in the diagrams from the prior posts how we encapsulate dependencies in workers, which are composed into a dependency graph with some depending on others. Details of the dependency engine are a topic for another day, but those curious about the code itself can get it from the GitHub repository.

Juju 4.0 has a new bootstrap worker. This worker has the domain services from the prior post as a dependency, so we have that leverage for initialisation tasks. It also gates the start of the API server worker. This means no clients or agents can act before it has run.

Once it runs successfully, it uninstalls itself from the dependency engine, effectively operating as a run-once activity. By pushing as much logic as possible from bootstrap into this worker we get the benefit of:

  • Bootstrap on a diet. It has no more special case logic than absolutely necessary.
  • As much logic as possible provided by domain services, with all the developer ergonomics that implies.
3 Likes

The change stream

In the original post, I mentioned the change stream.

This is the basis of Juju’s watchers under Dqlite.

The whole watcher system is driven from a single table, change_log. For any table containing data requiring action when it changes, we generate triggers that will write rows to this table. Each row includes a namespace (usually a table name), a unique identifier for the change and whether it was an insert, update or delete.

For each database, we run a Stream worker that simply reads all change_log entries since its last high water mark and sends them as a term. We’ll come back to this in a moment. Once a term is dispatched, we record the dispatched window so that another worker can periodically prune dispatched change_log rows, preventing unbounded growth.

The recipient of the Stream worker’s terms is an EventMultiplexer. This is what distributes the terms to watchers, based on their Subscription.

An individual watcher will register a Subscription with the EventMultiplexer containing one or more SubscriptionOption arguments. These can designate any combination of namespace, identifier or change type that will cause the multiplexer to send the watcher particular changes from a term.

The prime actor in this arrangement is the EventMultiplexer. It is what triggers the Stream to deliver each term. It runs in a loop, simply requesting the next term and distributing the changes based on subscribers. There is no sleep in between these steps unless the term is empty, whereupon it backs off so as not to thrash.

One can see here how the speed of the multiplexer’s distribution can affect the term sizes. If it is fast, such as for few subscribers or few events, the terms will be small and frequent. If it takes longer to process terms, they will tend to be larger and less frequent.

The astute reader will detect an issue here. What if there is a bad watcher that for some reason is not processing events sent by the multiplexer? Such an actor could effectively block the multiplexer from completing its processing of a term, stalling all other watchers. What we do here is impose a time-out within which a watcher much accept its events. If this period is exceeded, we simply unsubscribe the watcher and proceed as usual. This in turn causes an error to flow through to the watching component, which will handle it as appropriate. This will invariably be some other worker, which exits with an error, restarts and sets up its watcher anew.

There is a very nice characteristic in term dispatching - every change that occurs in a transaction is guaranteed to be delivered in the next term. This makes it easy to reason about what notifications will be sent to all relevant watchers at the same time based on domain operations.

Another nice characteristic is the flexibility of this system. At present we populate changes at the persistence layer via triggers, but there is nothing stopping us from declaring any arbitrary namespace and inserting change_log rows based on domain logic.

2 Likes