Juju 4.0 Architecture

As work on Juju 4.0 (sans MongoDB) continues apace, we’ve been trickling pieces of our emerging relational data model into this post.

We’ll now look at some architectural aspects of how we’re supplying access to this data model via services.

As a starting point I’ll explain the following arrangement.

Controller charm

In major versions 2 and 3 of Juju, we allowed the isolation of traffic across the MongoDB cluster by setting a space name in the juju-ha-space controller configuration article.

This will be dropped in 4.0 in favour of using a binding for the controller charm’s dbcluster peer relation. In this way we remove an item of esoteria and replace it with a well-known concept from the Juju ecosystem.

The controller charm maintains a configuration file on disk similar to agent configuration. Its contents include the desired cluster topology.

Database accessor

The dbWorker is responsible for starting the local Dqlite node and negotiating its membership in the cluster. Each controller is a Dqlite node. Nodes are joined to the cluster based on the configuration file written by the controller charm.

It also maintains a set of trackedDB child workers, one for each database. Juju has a main controller database, plus one for each model. These workers export to other workers, the ability to acquire a transaction runner. Each one monitors the health of its database connection and implements a retry strategy for transactions submitted to the database.

Change stream

Juju’s operation is heavily predicated on watchers. Watchers allow agents to watch some aspect of the data model and receive a notification if it changes. For example, the Juju provisioner watches for new machine records and sets about provisioning them when it is notified.

In Juju 3.x versions we use a feature of MongoDB called aggregation pipelines to create a stream of data changes. This stream is multiplexed to individual watchers based on the particular events that they subscribe to.

Dqlite does not possess such a mechanism, so the Juju team designed a trigger-based change log, which is read in a loop by the changeStreamWorker. This worker dispatches terms to another worker that manages subscriptions from watchers.

By wrapping the transaction runner ability supplied by the trackedDB workers in another worker that allows plugging in to an event stream, we can export the ability to both run transactions and create watchers.

Domain services

The ultimate consumers of runner/watcher capability are the main Juju domain services. This is where we implement all of our application logic. These services are recruited by the API server in order to serve clients and agents, and by other workers in the controller database.

Simple… not so fast

In the next post we’ll go into more detail on how we provide access to the cloud substrates (providers) from the domain services, which will shed some light on the extent to which we’ve abbreviated the diagram above.

10 Likes

In the prior post I illustrated (from a mile high) how we provide database access to our domain services. However, Juju is no simple OLTP system. In addition to data for representing the system state, we need access to:

  • Object storage for charm BLOBs and agent binaries.
  • The cloud that Juju provisions resources on, also known as a provider.

With MongoDB, Juju takes advantage of GridFS to store BLOBs. This is quite convenient, because Mongo replicates stored objects across the cluster without any intervention from us, meaning every stored BLOB becomes accessible by any controller.

Without GridFS to rely on, we had to write our own object storage facility. By default, Juju will use the controller’s file-system for BLOBs, but it can now be configured to use an S3-compatible back-end such as MinIO. Credit goes to @simonrichardson for the heavy lifting on this.

One of the aspects making the data back-end change such a huge undertaking is the fact that our legacy state package accumulated over time, mechanisms for accessing object storage and providers. In order to swap out our data back-end, we also needed to rework access to these other concerns.

We did this by extending the idea that we used for domain services.

The left-hand side of this diagram shows the dependency from the original post: database → change stream → domain services. The rest is our new dependency arrangement for object storage and providers.

Separation of concerns

To instantiate an object store or provider requires access to stored state - we need to know model configuration, cloud credentials etc. We also need watchers that can let us take action if this state changes.

You can see that we’ve supplied this access by creating new service packages dedicated to those concerns. This was chosen over the following alternatives:

  • Giving the object store and provider workers direct access to a transaction runner.
  • Having domain services include logic required for instantiating object store and providers.

All of our service packages possess the following characteristic: the data access components are separated from other service logic and only responsible for running transactions in the database. This means that engineers programming services can never fall victim to a class of foot-guns that include making a long-running blocking call to a provider inside a database transaction.

Engineering velocity

What is emerging here is exciting to the Juju team. Where before we had a lot of ad-hoc instantiation of object stores and providers, even using different patterns in different places, we have made it so the engineer of Juju features doesn’t need to be concerned with that level of detail at all.

The bottom left corner shows the culmination of our dependency composition. Database state, watcher, object storage and cloud provider are all supplied to the domain services package for easy use in implementing application logic.

This will pay big dividends for the project into the future.

5 Likes

Domain services. Nice idea, but…

I described in the prior post how we set up a dependency chain under Juju’s domain services to provide developers with a kind of internal API.

But there are situations where we need to affect changes before we can satisfy the dependencies for a rich domain services offering. These include:

  • Bootstrap, where we need to provision a controller from scratch.
  • Model creation, where we need to create cloud resources before activating the model.
  • Model migration, where we need to verify operation of a model before commiting to its presence on a new controller.

In this post we will concern ourselves with the bootstrap phase.

Bootstrap

To create and start a Juju controller, we need to:

  • Validate supplied credentials, configuration and constraints.
  • Access the cloud with the relevant credential to provision the initial controller machine/pod/container and other resources.
  • Set up and start the controller there.
  • Record the initial state in our database.

The concern with bootstrapping is that we need logic in the client - there’s no controller yet to run it on! This inflates the size of the client binary, not only with dependencies, but with all the logic particular to this out-of-band cloud operation.

In versions of Juju prior to 4.0, the bootstrap logic has grown over time to encompass initialisation generally, not just what we need to do to get a controller running. Things that we do as part of the bootstrap process include:

  • Discovery of network configuration from the cloud.
  • Creation of an initial model.
  • Addition of the initial user and keys.
  • Deployment of the controller charm.
  • Population of agent binaries in object storage.

This goes against the grain of what we’ve done with domain services. All of these activities need to occur as part of normal Juju operation, so they exist in domain service logic, but if run during bootstrap we need a special case for each of them.

The solution is to delay all logic that is not absolutely necessary to start a controller, until after it starts. But how?

Dependency engine to the rescue

The answer is, with another worker. We can see in the diagrams from the prior posts how we encapsulate dependencies in workers, which are composed into a dependency graph with some depending on others. Details of the dependency engine are a topic for another day, but those curious about the code itself can get it from the GitHub repository.

Juju 4.0 has a new bootstrap worker. This worker has the domain services from the prior post as a dependency, so we have that leverage for initialisation tasks. It also gates the start of the API server worker. This means no clients or agents can act before it has run.

Once it runs successfully, it uninstalls itself from the dependency engine, effectively operating as a run-once activity. By pushing as much logic as possible from bootstrap into this worker we get the benefit of:

  • Bootstrap on a diet. It has no more special case logic than absolutely necessary.
  • As much logic as possible provided by domain services, with all the developer ergonomics that implies.
3 Likes