Introduction
This article is a follow-up to a series of posts that discuss potential approaches for decoupling the business logic within Juju from the persistence layer.
This new proposal incorporates the constructive feedback that I have received so far and, in my view, would improve the way that we write and test Juju business logic.
How does it work?
The proposed solution makes heavy use of code generation to create an ORM-like layer which will serve as the foundation for writing Juju business logic.
You might be wondering: why not just pick one of the readily available ORM solutions for Go applications? First of all, we don’t need a fully-fledged ORM as the Juju data access patterns are quite simple (i.e. read a bunch of objects, check some constraints, mutate and write back, and finally have a watcher react to object changes). Secondly, it’s not easy to find an ORM solution that supports both relational, document-based and/or other non-standard backends (e.g. in-memory, raft-backed). Lastly, by rolling a custom ORM-like (emphasis on the -like part) we can provide additional guarantees that are required/expected when writing Juju-specific business logic.
As with all ORM solutions, we start by defining the schema for our domain models (not to be confused by the concept of models that are hosted by a Juju controller). The following deep-dive sections will go into more detail about how schema definitions work; for now just assume that someone has already taken care of that task. The schemas are then fed into a schema compiler which then emits a set of models (a pretty standard practice for most ORMs that do not rely on dynamic model generation; see Hibernate, gorm etc.) as well as some useful interfaces that we will explore in more depth further down.
The way that Juju business logic is currently written comes with a few caveats:
- We perform some assertions within the business logic code (e.g. is the unit alive?)
- We utilize DB-specific concepts (e.g. transactions) and also generate DB-specific assertions as preconditions for applying a set of changes.
- The code that generates the transaction operations includes special logic to check if it is running for the first time or whether we are retrying the transaction and need to refresh the objects we read from the DB.
The key goal of this approach is to make it easier for developers to create business logic rules by removing all of the above caveats. To better understand the approach let’s take a look at an example of how we could create a business logic rule using the proposed system.
What you see below has been generated by the code linked to the bottom of this article.
Also, please keep in mind that the example is meant to demonstrate what you can do with the system and does not portray an actual business rule.
func (op *BogusOperation) Execute(ns store.ModelNamespace) error {
// Lookup by primary key.
app, err := ns.FindApplicationByName(op.appName)
if err != nil {
return err
}
// Setters use a fluent syntax so they can be chained.
app.
SetChannel("edge").
SetDesiredScale(1).
SetExposed(true).
SetExposedEndpoints(map[string]model.ExposedEndpoint{
"admin": {
Spaces: []string{"outer"},
CIDRs: []string{"10.0.0.0/24"},
},
})
// Looking up objects that have already been read yields the same object
// *pointer*. The `ModelNamespace` provides read-your-writes semantics
// within the context of an operation.
app, err := ns.FindApplicationByName(op.appName)
if err != nil {
return err
}
// Example: app.GetChannel() returns "edge".
// Example assertions:
//
// Boolean fields are assigned an "IsXXX" getter with the exception of
// fields with a "HasXXX" prefix where the name is preserved as-is.
// All other fields are assigned a "GetXXX" method.
if app.IsSubordinate() {
return errors.NotSupportedf("bogusOP on subordinate app %q", app.GetName())
} else if app.HasResources() {
return errors.NotSupportedf("bogusOP on app %q with resources", app.GetName())
}
// Looking up models by something other than their PK yields an iterator.
// An "AllXXX" method is also provided for iterating all models of a particular type.
// A "CountXXX" method is also provided for counting all models of a particular type.
unitIt := ns.FindUnitsByApplication(app.GetName())
// Forgetting to close iterators will raise an error with the file:line where the iterator was obtained
defer unitIt.Close()
for unitIt.Next() {
unit := unitIt.Unit()
if unit.GetName() == "bogus/0" {
// Flag model for deletion.
unit.Delete()
continue
}
// Mutate another model type.
unit.SetSubordinates([]string{"some", "subordinates"})
}
if err := unitIt.Error(); err != nil {
return errors.Annotatef(err, "iterating units of application %q", app.GetName())
}
// Create a new model instance and populate its fields. It will be
// persisted together with any other changes.
ns.NewMachine().SetID("mach/0")
return nil
}
As you can see from the above example, the business logic code (I like to use the tentative term Operation to refer to it for lack of a better name; suggestions appreciated!) receives a store.ModelNamespace
as an argument.
This is an interface which allow us to query domain models from the store and/or create new models that are automatically scoped to a particular Juju model (note that the domain models do not need to define a modelUUID
field). The interface is implemented by each backend and serves the role of a transaction context, i.e. any object accessed, created or mutated through a particular ModelNamespace
instance should be thought of as being part of a logical transaction which will be atomically committed (or rolled back).
So how does the store implementation guarantee atomic commits? The answer is via version checks!
Each store implementation augments the set of model fields with a reference to the juju model UUID they belong to (can be empty for shared model instances) and its version. Version is an opaque, backend-specific, immutable attribute of each domain model which is populated when the store loads the model and cannot be mutated afterwards (i.e. there is no setter for it).
Each ModelNamespace
implementation keeps track of all model instances that it has emitted so far. Any mutations to the model instances are temporarily stored in-memory until the store’s ApplyChanges
method is invoked with the ModelNamespace
as an argument. At that point, the store implementation collects the list of referenced objects and maps them into a backend-specific list of operations that must be executed transactionally.
Regardless of the way that the backend-specific transaction is implemented, all backend implementations check the referenced object versions against the version stored in the backing store. This allows us to automatically rollback transactions if a mismatch is detected and emit a common error.
The mechanism that applies operations (let’s call it the OperationApplier
) accepts a Store
interface, an Operation
interface and a modelUUID
argument. It’s purpose is to ask the store for a ModelNamespace
for the specified modelUUID
, pass it to the operation and then ask the store to apply any changes tracked by the model namespace. If the store reports a version mismatch error, the applier repeats the process with a fresh ModelNamespace
instance that allows the Operation to read the latest set of objects from the store.
As a result, the operation implementation is completely oblivious to the retry mechanism and can focus on acquiring the models it requires, making any business-logic-related checks (is X alive?) and mutating them. The internal version tracking mechanism ensures that any checks made within the context of an operation remain valid at the point where the changes are about to be committed.
Moreover, since both Store
and ModelNamespace
are interfaces, unit-testing the business logic using mocks becomes trivial. Finally, if we want to, we can also apply the interface segregation principle (the I in SOLID) and have the operation receive a minimal interface (a subset of ModelNamespace
) with just the model lookup methods that it requires.
A technical deep-dive into the schema compiler implementation
The following sections dive into more details about the schema compiler and the types of artifacts that it generates.
Defining model schemas
Model schemas are defined as plain Go structures with a set of optional schema
tags associated with their fields. The model definitions all live in the same package.
Note that the schemas per se are never referenced by the generated code and are meant to be committed to git. This provides the opportunity to effectively calculate a schema diff between subsequent generator runs and to potentially automatically generate migration steps as part of the process. Note that this functionality is not implemented in the generator described in this article.
The model schema parser scans all files in the specified schema package (--schema-pkg
) and processes any exported struct definitions looking for schema
tags. The format of a schema tag is: schema:$backend_field_name,attributes...
.
The backend_field_name
placeholder specifies the preferred field name that should be used by the store backend (e.g. a column name for sqlite, a doc field for mongo, etc.). If omitted (or no schema tag is specified), a suitable (lower) camel-cased name will be generated based on the Go field name.
The optional attribute list may contain any of the following attributes:
-
pk
. Specifies a field as the PK for the model. -
find_by
. Specifies that we can lookup instances (i.e. queries returns iterators) of this model filtering by this field.
The schema parser classifies structs into primary models or secondary/embeddable models based on whether they specify a PK field or not. In addition, any field that is either a pointer, a slice or a map is flagged as nullable.
When parsing schema definitions, the parser implementation uses the go/build
tool-chain to compile the schema package and extract information about not only the fully qualified type of each struct field but also the imports (if any) needed to reference it from other models.
In addition, the parser walks the type information tree to obtain the actual type of each field. So for example, a field of type state.Life
, gets a fully qualified type of github.com/juju/juju/state.Life
(with github.com/juju/juju/state
being the import dependency) and a resolved type of uint8
(Life
is an alias to a uint8
). The sqlite backend generator evaluates resolved types when mapping fields to a suitable SQL type as well as for deciding when a (non-scalar) field value must be serialized or not.
Schema compiler artifacts
The schema compiler is the heart of this proposal. It extracts the schema definitions
for the models we want to work with and provides them as input to a set of
Go templates that produce the following artifacts:
- A set of models with getters/setters and internal state tracking (version, per-field mutations, whether the model is persisted or not, whether the model is flagged for deletion etc.)
- An interface (per-model accessor) for retrieving a model by its PK, getting the model count, iterating (iterators also get their own interface) models based on a field filter or simply iterating all models.
- An interface (
ModelNamespace
) which embeds the above accessor interfaces and limits lookups to a particular Juju model UUID while also serving as the logical transaction context for the business logic. - A Store interface.
- A common store validation suite which ensures that all store implementations behave in exactly the same way. The test suite covers cases such as:
- CRUD operations for each model (also ensuring that nested types are marshaled/unmarshaled appropriately).
- Use of iterators (including detection of iterator leaks: iterators must always be closed).
- Ensuring that version-mismatches (e.g. another thread already modified/deleted a model we are currently trying to change) when attempting committing changes are correctly detected and cause the transaction to be rolled back.
- Ensuring that
ModelNamespace
implementations provide read-your-write semantics.
- Multiple store implementations (in-memory, sqlite and mongo) as well as a test
suite for each one that hooks up each backend to the common store validation suite.
Supported store backends
The code that accompanies this article generates an sqlite, mongo and an in-memory store. The following sections will discuss implementation-specific details for each store type.
Sqlite
The sqlite store uses a shared table for each primary model. Each table schema is augmented with a model_uuid
and a version
column and defines a composite primary key which includes the model_uuid
and the model-defined PK.
An index with a composite key (model_uuid
, lookup_field) will be created for each field that can be used to perform lookups.
In addition, all CRUD model queries are auto-generated and prepared once when the store is initialized. A caveat of this approach is that model updates effectively overwrite the entire row even if the store is aware of exactly which fields have been mutated.
As expected by an sqlite backend, the store implements a multiple readers / single writer pattern using a sync.RWMutex
. Since only one writer can be active at any given time, the store (while holding the write lock) first checks the object versions via read queries and then upgrades to a write transaction to commit the actual changes. The stored model versions are incremented as part of any executed SQL update query.
Any non-scalar model fields are automatically marshaled into JSON and stored as CLOBs. The reverse process is followed when reading models from the DB.
Mongo
The mongo backend uses a shared collection for each primary model. Each document is augmented with a model_uuid
field (and a txn-revno
field that gets injected by the mgo/txn package).
An index with a composite key (model_uuid
, lookup_field) will be created for each field that can be used to perform lookups.
The mongo object ID for each document is generated by concatenating the modelUUID
and the model PK using a colon separator. This matches the behavior currently used by Juju.
The store implementation generates a set of mongo/txn.Op
values with the suitable assertions depending on the mutation type (e.g. DocMissing
assertion when inserting models and a txn-revno
assertion when updating/deleting models).
In contrast to the sqlite backend implementation, the mongo backend only updates the fields that have been modified.
The mongo backend leverages the mgo/txn
package for implementing transactions and can be configured to use either client- or server-side transactions when the store is initialized.
In-memory
The in-memory backend maintains models in-memory using maps where the PK serves as the map key. In contrast to the other backends, the in-memory implementation maintains separate maps for each modelUUID
and does not support indices (i.e. does a full “table” scan for lookups).
While the in-memory backend works great for running tests, it does not offer any support for persistence. However, the RAFT code from the previous article can be adapted to provide such a persistence layer (though using dqlite would probably be a better idea).
Show us the code!
You can find the accompanying code for this article here.
In addition, the example branch in the same repository includes a schema sample and the full set of artifacts (code and tests) that were generated by the schema compiler.
If you checkout the example
branch you can run the test suites as follows:
$ mkdir /tmp/mongotest
$ ulimit -S -n 2048 && mongod --dbpath /tmp/mongotest --replSet rs0 --logpath /dev/null &
$ cd backend
$ go test -check.v -cover
Future work / ideas
- Use code-generation to handle the RPC wire protocol marshaling/unmarshaling.
- Leverage git diffs to auto-generate migration steps.