A proposal for decoupling Juju business logic from the persistence layer (part 2)

achilleasa · 19 May 2021 11:41

Introduction

This article is a follow-up to a series of posts that discuss potential approaches for decoupling the business logic within Juju from the persistence layer.

This new proposal incorporates the constructive feedback that I have received so far and, in my view, would improve the way that we write and test Juju business logic.

How does it work?

The proposed solution makes heavy use of code generation to create an ORM-like layer which will serve as the foundation for writing Juju business logic.

You might be wondering: why not just pick one of the readily available ORM solutions for Go applications? First of all, we don’t need a fully-fledged ORM as the Juju data access patterns are quite simple (i.e. read a bunch of objects, check some constraints, mutate and write back, and finally have a watcher react to object changes). Secondly, it’s not easy to find an ORM solution that supports both relational, document-based and/or other non-standard backends (e.g. in-memory, raft-backed). Lastly, by rolling a custom ORM-like (emphasis on the -like part) we can provide additional guarantees that are required/expected when writing Juju-specific business logic.

As with all ORM solutions, we start by defining the schema for our domain models (not to be confused by the concept of models that are hosted by a Juju controller). The following deep-dive sections will go into more detail about how schema definitions work; for now just assume that someone has already taken care of that task. The schemas are then fed into a schema compiler which then emits a set of models (a pretty standard practice for most ORMs that do not rely on dynamic model generation; see Hibernate, gorm etc.) as well as some useful interfaces that we will explore in more depth further down.

The way that Juju business logic is currently written comes with a few caveats:

We perform some assertions within the business logic code (e.g. is the unit alive?)
We utilize DB-specific concepts (e.g. transactions) and also generate DB-specific assertions as preconditions for applying a set of changes.
The code that generates the transaction operations includes special logic to check if it is running for the first time or whether we are retrying the transaction and need to refresh the objects we read from the DB.

The key goal of this approach is to make it easier for developers to create business logic rules by removing all of the above caveats. To better understand the approach let’s take a look at an example of how we could create a business logic rule using the proposed system.

What you see below has been generated by the code linked to the bottom of this article.

Also, please keep in mind that the example is meant to demonstrate what you can do with the system and does not portray an actual business rule.

func (op *BogusOperation) Execute(ns store.ModelNamespace) error {
	// Lookup by primary key.
	app, err := ns.FindApplicationByName(op.appName)
	if err != nil {
		return err
	}

	// Setters use a fluent syntax so they can be chained.
	app.
		SetChannel("edge").
		SetDesiredScale(1).
		SetExposed(true).
		SetExposedEndpoints(map[string]model.ExposedEndpoint{
			"admin": {
				Spaces: []string{"outer"},
				CIDRs:  []string{"10.0.0.0/24"},
			},
		})

	// Looking up objects that have already been read yields the same object 
	// *pointer*. The `ModelNamespace` provides read-your-writes semantics 
	// within the context of an operation.
	app, err := ns.FindApplicationByName(op.appName)
	if err != nil {
		return err
	}
	// Example: app.GetChannel() returns "edge".

	// Example assertions:
	//
	// Boolean fields are assigned an "IsXXX" getter with the exception of
	// fields with a "HasXXX" prefix where the name is preserved as-is.
	// All other fields are assigned a "GetXXX" method.
	if app.IsSubordinate() {
		return errors.NotSupportedf("bogusOP on subordinate app %q", app.GetName())
	} else if app.HasResources() {
		return errors.NotSupportedf("bogusOP on app %q with resources", app.GetName())
	}

	// Looking up models by something other than their PK yields an iterator.
	// An "AllXXX" method is also provided for iterating all models of a particular type.
	// A "CountXXX" method is also provided for counting all models of a particular type.
	unitIt := ns.FindUnitsByApplication(app.GetName())

	// Forgetting to close iterators will raise an error with the file:line where the iterator was obtained
	defer unitIt.Close()
	for unitIt.Next() {
		unit := unitIt.Unit()
		if unit.GetName() == "bogus/0" {
			// Flag model for deletion.
			unit.Delete()
			continue
		}

		// Mutate another model type.
		unit.SetSubordinates([]string{"some", "subordinates"})
	}
	if err := unitIt.Error(); err != nil {
		return errors.Annotatef(err, "iterating units of application %q", app.GetName())
	}

	// Create a new model instance and populate its fields. It will be
	// persisted together with any other changes.
	ns.NewMachine().SetID("mach/0")

	return nil
}

As you can see from the above example, the business logic code (I like to use the tentative term Operation to refer to it for lack of a better name; suggestions appreciated!) receives a store.ModelNamespace as an argument.

This is an interface which allow us to query domain models from the store and/or create new models that are automatically scoped to a particular Juju model (note that the domain models do not need to define a modelUUID field). The interface is implemented by each backend and serves the role of a transaction context, i.e. any object accessed, created or mutated through a particular ModelNamespace instance should be thought of as being part of a logical transaction which will be atomically committed (or rolled back).

So how does the store implementation guarantee atomic commits? The answer is via version checks!

Each store implementation augments the set of model fields with a reference to the juju model UUID they belong to (can be empty for shared model instances) and its version. Version is an opaque, backend-specific, immutable attribute of each domain model which is populated when the store loads the model and cannot be mutated afterwards (i.e. there is no setter for it).

Each ModelNamespace implementation keeps track of all model instances that it has emitted so far. Any mutations to the model instances are temporarily stored in-memory until the store’s ApplyChanges method is invoked with the ModelNamespace as an argument. At that point, the store implementation collects the list of referenced objects and maps them into a backend-specific list of operations that must be executed transactionally.

Regardless of the way that the backend-specific transaction is implemented, all backend implementations check the referenced object versions against the version stored in the backing store. This allows us to automatically rollback transactions if a mismatch is detected and emit a common error.

The mechanism that applies operations (let’s call it the OperationApplier) accepts a Store interface, an Operation interface and a modelUUID argument. It’s purpose is to ask the store for a ModelNamespace for the specified modelUUID, pass it to the operation and then ask the store to apply any changes tracked by the model namespace. If the store reports a version mismatch error, the applier repeats the process with a fresh ModelNamespace instance that allows the Operation to read the latest set of objects from the store.

As a result, the operation implementation is completely oblivious to the retry mechanism and can focus on acquiring the models it requires, making any business-logic-related checks (is X alive?) and mutating them. The internal version tracking mechanism ensures that any checks made within the context of an operation remain valid at the point where the changes are about to be committed.

Moreover, since both Store and ModelNamespace are interfaces, unit-testing the business logic using mocks becomes trivial. Finally, if we want to, we can also apply the interface segregation principle (the I in SOLID) and have the operation receive a minimal interface (a subset of ModelNamespace) with just the model lookup methods that it requires.

A technical deep-dive into the schema compiler implementation

The following sections dive into more details about the schema compiler and the types of artifacts that it generates.

Defining model schemas

Model schemas are defined as plain Go structures with a set of optional schema tags associated with their fields. The model definitions all live in the same package.

Note that the schemas per se are never referenced by the generated code and are meant to be committed to git. This provides the opportunity to effectively calculate a schema diff between subsequent generator runs and to potentially automatically generate migration steps as part of the process. Note that this functionality is not implemented in the generator described in this article.

The model schema parser scans all files in the specified schema package (--schema-pkg) and processes any exported struct definitions looking for schema tags. The format of a schema tag is: schema:$backend_field_name,attributes....

The backend_field_name placeholder specifies the preferred field name that should be used by the store backend (e.g. a column name for sqlite, a doc field for mongo, etc.). If omitted (or no schema tag is specified), a suitable (lower) camel-cased name will be generated based on the Go field name.

The optional attribute list may contain any of the following attributes:

pk. Specifies a field as the PK for the model.
find_by. Specifies that we can lookup instances (i.e. queries returns iterators) of this model filtering by this field.

The schema parser classifies structs into primary models or secondary/embeddable models based on whether they specify a PK field or not. In addition, any field that is either a pointer, a slice or a map is flagged as nullable.

When parsing schema definitions, the parser implementation uses the go/build tool-chain to compile the schema package and extract information about not only the fully qualified type of each struct field but also the imports (if any) needed to reference it from other models.

In addition, the parser walks the type information tree to obtain the actual type of each field. So for example, a field of type state.Life, gets a fully qualified type of github.com/juju/juju/state.Life (with github.com/juju/juju/state being the import dependency) and a resolved type of uint8 (Life is an alias to a uint8). The sqlite backend generator evaluates resolved types when mapping fields to a suitable SQL type as well as for deciding when a (non-scalar) field value must be serialized or not.

Schema compiler artifacts

The schema compiler is the heart of this proposal. It extracts the schema definitions
for the models we want to work with and provides them as input to a set of
Go templates that produce the following artifacts:

A set of models with getters/setters and internal state tracking (version, per-field mutations, whether the model is persisted or not, whether the model is flagged for deletion etc.)
An interface (per-model accessor) for retrieving a model by its PK, getting the model count, iterating (iterators also get their own interface) models based on a field filter or simply iterating all models.
An interface (ModelNamespace) which embeds the above accessor interfaces and limits lookups to a particular Juju model UUID while also serving as the logical transaction context for the business logic.
A Store interface.
A common store validation suite which ensures that all store implementations behave in exactly the same way. The test suite covers cases such as:
- CRUD operations for each model (also ensuring that nested types are marshaled/unmarshaled appropriately).
- Use of iterators (including detection of iterator leaks: iterators must always be closed).
- Ensuring that version-mismatches (e.g. another thread already modified/deleted a model we are currently trying to change) when attempting committing changes are correctly detected and cause the transaction to be rolled back.
- Ensuring that ModelNamespace implementations provide read-your-write semantics.
Multiple store implementations (in-memory, sqlite and mongo) as well as a test
suite for each one that hooks up each backend to the common store validation suite.

Supported store backends

The code that accompanies this article generates an sqlite, mongo and an in-memory store. The following sections will discuss implementation-specific details for each store type.

Sqlite

The sqlite store uses a shared table for each primary model. Each table schema is augmented with a model_uuid and a version column and defines a composite primary key which includes the model_uuid and the model-defined PK.

An index with a composite key (model_uuid, lookup_field) will be created for each field that can be used to perform lookups.

In addition, all CRUD model queries are auto-generated and prepared once when the store is initialized. A caveat of this approach is that model updates effectively overwrite the entire row even if the store is aware of exactly which fields have been mutated.

As expected by an sqlite backend, the store implements a multiple readers / single writer pattern using a sync.RWMutex. Since only one writer can be active at any given time, the store (while holding the write lock) first checks the object versions via read queries and then upgrades to a write transaction to commit the actual changes. The stored model versions are incremented as part of any executed SQL update query.

Any non-scalar model fields are automatically marshaled into JSON and stored as CLOBs. The reverse process is followed when reading models from the DB.

Mongo

The mongo backend uses a shared collection for each primary model. Each document is augmented with a model_uuid field (and a txn-revno field that gets injected by the mgo/txn package).

An index with a composite key (model_uuid, lookup_field) will be created for each field that can be used to perform lookups.

The mongo object ID for each document is generated by concatenating the modelUUID and the model PK using a colon separator. This matches the behavior currently used by Juju.

The store implementation generates a set of mongo/txn.Op values with the suitable assertions depending on the mutation type (e.g. DocMissing assertion when inserting models and a txn-revno assertion when updating/deleting models).

In contrast to the sqlite backend implementation, the mongo backend only updates the fields that have been modified.

The mongo backend leverages the mgo/txn package for implementing transactions and can be configured to use either client- or server-side transactions when the store is initialized.

In-memory

The in-memory backend maintains models in-memory using maps where the PK serves as the map key. In contrast to the other backends, the in-memory implementation maintains separate maps for each modelUUID and does not support indices (i.e. does a full “table” scan for lookups).

While the in-memory backend works great for running tests, it does not offer any support for persistence. However, the RAFT code from the previous article can be adapted to provide such a persistence layer (though using dqlite would probably be a better idea).

Show us the code!

You can find the accompanying code for this article here.

In addition, the example branch in the same repository includes a schema sample and the full set of artifacts (code and tests) that were generated by the schema compiler.

If you checkout the example branch you can run the test suites as follows:

$ mkdir /tmp/mongotest
$ ulimit -S -n 2048 && mongod --dbpath /tmp/mongotest --replSet rs0 --logpath /dev/null &

$ cd backend
$ go test -check.v -cover

Future work / ideas

Use code-generation to handle the RPC wire protocol marshaling/unmarshaling.
Leverage git diffs to auto-generate migration steps.

niemeyer · 19 May 2021 13:18

@achilleasa Thanks for the write up and for the continued interest on this issue.

If I understand the proposal correctly, my initial feeling remains similar to the previous points raised on the other thread, since the proposal hasn’t diverged from the earlier one regarding the issues raised there. That is, we continue to have a convention that isn’t operating as traditional Go code does, and there is no visible notion of transactions for people to operate with. That means we’ll continue to have an extra layer of indirection when developers are writing features for juju that interact with the persistence layer. My hope remains that once we replace the overhead of the existing application-level transactional system we can find and move towards a system that frees the developer from having to special case their thinking for juju and can instead trust on a more traditional underlying transactional infrastructure.

achilleasa · 19 May 2021 13:43

Actually, with the current proposal, we introduce an explicit transaction context (the ModelNamespace). The business logic is meant to target it and can be split into blocks, or composed as needed. As long as any block of code references a particular ModelNamespace instance, it operates within a logical transaction (which can be implemented client- or server-side depending on the primitives supported by the underlying backend).

So, we are now writing regular Go code, checking the model state (domain-specific assertions) as needed and mutate it according to the business logic rules. The benefit of this approach is that there is neither a coupling to the backend nor are we introducing special/low-level assertions like we do with the current approach in Juju (e.g. the mgo-specific DocExists/Missing, txn.Ops etc.).

As proposed in the article, the business logic (including checks/assertions whatever we want to call them) is fully contained within a function/method, makes use of domain-specific models only and has no ties to a particular backend. While the backend implementation is special-cased to Juju, this is entirely transparent to the developer who should never need to interact with the backend directly.

niemeyer · 19 May 2021 15:44

Maybe I misunderstand then. From the description it sounded like that was all logic inside juju instead of logic inside the database, which means you still have no access to the underlying transaction mechanism, which implies you’ll be doing transactions, indexing, multiversioning, querying, and all those things that a database is supposed to do for you, manually.

But again, perhaps I misunderstand?

achilleasa · 19 May 2021 16:10

Perhaps I didn’t express it clearly then

So, by logic, I explicitly refer to domain-specific business logic, i.e. the manipulation of the state of the various models (applications, units etc.). This is completely detached from the backing store/transactions/versioning e.t.c.

All of the underlying transaction-related bits are handled by the individual backing store implementations and are subject to the limitations imposed by the underlying system:

In the case of mongo, it’s using the mongo/txn (or sstxn if you request server-side txns) package because I could simply copy/paste the code out of juju ;-).
The sqlite store implements version checks via queries due to sqlite limitations for custom functions whereas in the case of non-embedded RDBMS, we could offload this to the DB (e.g. via postgres functions or the PLSQL equivalent for mysql).

The generator produces a shared backend-agnostic validation test-suite (providing ~80% coverage) which allows us to verify that all backend implementations behave exactly the same.

The main promise of the proposal (and fundamentally opposite, in my view, to the way we write state mutation code now) is that the business logic (take a look at the example operation in the article) can be expressed in a backend-agnostic way and can be applied to any of the supported backends that is wired (via DI) to the facade instance.

niemeyer · 19 May 2021 17:38

I still don’t understand how that could work, and this statement in particular is what reinforces the confusion in my mind:

We cannot create a sane transaction system based on mgo/txn and that is comfortable to use at the API level without buying into the same conventions that mgo/txn mechanisms are rooted on. As you know well from the experience with the state package, with mgo/txn we have explicit assertions that need to be cooked against specific fields that are meaningful to the business logic. This is not just an object version… it’s certain data changing or not, and in meaningful ways.

On the other side of that picture, to implement proper MVCC you need to be able to operate with snapshots. Such snapshots need to be done in coordination with the storage layer, because when you start a transaction you don’t know what data you’re going to be reading in the future. If you read object A and then read object B, if there’s no support for read isolation from the database, B will be off by the time you get to it. Even if its version doesn’t change, the version is wrong already taking A into consideration.

We can certainly implement all of these things, but this is a major investment and by the end of the day our goal is improving juju rather than developing an entire database with all these features, and then debugging issues forever.

achilleasa · 19 May 2021 19:17

Do you have a particular use-case in mind for this scenario? I am trying to understand why we would require read isolation in this case.

My interpretation of this example is that the operation flow would look something like this:

read A (version 2)
check some predicate on A’s field(s)
potentially modify A
read B (version 42)
potentially modify B
create C (initialized to version 1)
read D (version 4)
check some predicate on D’s field(s)
mark D for deletion

When it’s time to commit, the transaction would only commit (and increment the versions of existing objects A and B) iff the versions of A, B and D in the store (DB) match the ones we read as part of the operation flow. The check for C (the created object) is special and depends on the backend (mongo checks that the doc does not exist, SQL depends on PK uniqueness constraints aborting the insert if the row exists).

So, if any of these objects gets concurrently modified by another thread and makes it to the store before the above set of changes get committed, the store would report a version-mismatch error and the above operation would have to be retried (each time we run the operation steps, we use a new ModelNamespace instance so we always read a fresh set of objects from the store).

niemeyer · 19 May 2021 19:44

A = function(B, C)

The state of A is invalid because B and C never co-existed in the database with such state. We can workaround such problems, just like we workaround the complete lack of transactions in historical MongoDB by doing that at the application level. We also know that comes with a price, and my understanding is that we’re a bit tired of paying that price.