[specification] ISD014 - Managing Charm Complexity

jdkandersson · 18 August 2023 06:08

This was initially an internal document and is still a work in progress. Since it applies to a number of public charms now, the contents have been published on discourse.

Abstract

The IS DevOps team is accountable for a growing number of charms. Many of the charms use different architecture patterns which means that some are more complex than needed and reinvent how to implement common types of requirements. This spec sets our general architectural guidelines for charms including abstraction of the state, decoupling the application logic from the charm logic and to wrap dependencies with interfaces. These techniques can be used to manage complexity as a charm codebase grows.

It is intended to augment the existing style guide for charms.

Rationale

Currently, IS DevOps charms don’t follow common patterns and structure to manage complexity and are mostly implementing all state, business logic and external interactions within event callback methods. This results in strong coupling among those components and could lead to maintainability and readability issues. Furthermore, this coupling forces the developer to produce complex unit tests that require extensive and complex mocking.

By abstracting the state, decoupling the business logic from the charm and implementing clear interfaces, the code will be easily testable and reduce the need for mocking.

On top of that, many charms are following their own architecture and structure. Following more of the same opinionated architecture will ensure coherence and ease cross-team and community contribution. This will also speed up the delivery of charms and make it easier to understand new charms that follow this pattern to manage complexity.

Specification

This specification is broken up into several sections. The first section discusses the applicability of the patterns discussed in the spec, the second discusses abstracting the state provided by Juju, the third is on separating the application and charm logic and the fourth focuses on abstracting interfaces including how this is beneficial for testing and mocking. It is followed by an example to illustrate how the patterns can be implemented.

Applicability

The patterns proposed in this spec are intended to manage complexity. This means that simple charms that have on the order of 100 lines of code, may not benefit from the patterns and should not necessarily implement them. The following are indications that a charm may benefit from the patterns:

The charm you’re writing is not trivial
The tests require complex mocks
Many external interactions (e.g., HTTP calls, pebble run, copying files)
A new developer finds the charm difficult to understand

In general, unless there are good reasons not to, the charm you are working on should adopt these guidelines.

Also worthwhile to note is that it could be appropriate to implement only a subset of the patterns discussed in this spec for a given charm depending on the individual case. Some charms may have additional or unique sources of complexity which are not solved by any of the patterns discussed in this spec. If the patterns in this spec don’t apply, it is worth discussing the case with the team to see if a new pattern should be added to this spec or an existing pattern be refined.

When writing documentation, it may not be beneficial to use the patterns as it is important to keep examples succinct. It can be worthwhile to indicate that code in documentation is not intended to be a reference implementation and point to materials, such as this spec, with recommendations for how to write charms that are easy to maintain and understand.

Charm Runtime State Abstraction

This section discusses abstracting the configuration and integration data provided by Juju as well as other charm state through a model that is easier to interact with. The charm runtime state is:

The charm configuration
State of the integrations the charm provides
Remote configuration e.g. retrieved from a repository or S3 bucket
State of the workload (e.g., whether a setup script has been run, application configuration, …)
State that is stored on the filesystem of the application workload

The state model should be populated based on the sources described above. This can be done using, for example, properties in Python: https://docs.python.org/3/library/functions.html#property. The state is typically accessed using self.config["<key>"] which would instead be accessed as a property on the state. See the example section for how this would work.

Outside of the state abstraction charm.py and (in some cases) observers, code should only interact with the charm state via the abstraction. Code should not directly interact with the state provided by Juju (e.g., by accessing self.config["<key>"]). The state should be initialized in __init__ as noted here: Juju | Create a minimal Kubernetes charm

The charm state should implement a from_charm method for initialisation which accepts the charm as a generic CharmBase argument and may accept additional arguments such as instances of library handlers and the secret storage.

The unit tests of functions that require copies of the state should be built in the test rather than linked to real charm data such as charm configuration or integration data.

Abstracting the state has several benefits:

Fewer tests require the charm to be available reducing the need for mocking and running a charm in the tests
The details of accessing the state are encapsulated in the data model

This should be implemented based on (specifically regarding validation using pydantic) the DA016 - Charmed Structured Configuration spec (currently not publicly available).

The Jenkins k8s operator and Synapse k8s operator implement this pattern.

Separating Operational and Charm Logic

As charms grow in complexity, the charm base class can reach on the order of thousands of lines of code with many functions. This makes the charm difficult to understand and increases the complexity of tests. As the charm complexity increases, the operational logic (the logic interacting with the application being operated and any integrations or similar interactions) and the charm logic (the logic responding to events fired by Juju) should be separated. This has several benefits:

It makes the charm easier to understand
It reduces the complexity of tests because the operational logic can be tested without a running charm (whether real or simulated) getting closer to the formal definition of a unit test
It increases reusability as the operational logic is not closely coupled with Juju interactions
It accelerates adoption of future versions of Juju and any framework changes as breaking changes should be contained to the charm logic and not the operational logic

The class inheriting from CharmBase should be used purely to interact with Juju and orchestrate responding to events from Juju. It should not contain any logic for operating the application (where the application is, for example, Jenkins, Wordpress, …). Instead, it should delegate the application operational logic to separate modules or functions not defined in the charm class.

These modules/ functions should contain all the logic of operating the application. In theory, it should be possible to operate the application from a different controller (such as one for the command line) using the same modules or functions the charm class uses. Whilst that generally doesn’t need to be implemented for a charm, it is a useful concept to keep in mind when writing the charm operational logic.

Tests of the modules/ functions implementing the operational logic should not require mocks beyond the mocks described in the charm dependent services section in this spec.

Functions and classes within the module generally shouldn’t include the name of the module. For example, a function that restarts Wordpress in the wordpress.py module should be called restart instead of restart_wordpress.

Generally, these should be implemented as a module with functions. A class is useful if it is required to store state associated with the instance and should be rarely used.

The modules for operational logic should only accept CharmState and potentially the container object for the purpose of file system interactions and executing commands on the container. This should only be needed for k8s charms, for machine charms these modules should only accept CharmState as an argument. Additional arguments may be warranted, although they should be generic data such as string, integers, booleans, lists, dictionaries and so on and should not be any data that should instead be on CharmState.

The Jenkins k8s operator and Pollen operator implement this pattern.

Charm Dependent Services

As charms grow in complexity and in the number of supported integrations, the interactions between these components become complex. Interface abstractions can help manage this complexity by encapsulating what is needed from the application, integrations and dependencies behind the interface.

Any external interactions should be abstracted through an interface. External interactions include:

Command line interactions
Interactions with the application (example)
Interactions with kubernetes (example - this, and the below examples, are a charm library, these could also be implemented using a module within the charm source code depending on how broadly the code can be reused)
Interactions with the operating system (example)
Interactions with an integration
Interactions with another service (e.g., an external API)

The operational logic should not directly interact with these services. Instead, an interface should be written that is loosely coupled and simple. Loose coupling and simplicity is indicated by, for example:

The number of arguments
The kind of arguments
The number of public functions exposed
The complexity of mocks replacing the interface for tests of functions using the interface
The complexity of error handling for callers of the interface

These interfaces should be written in Python and take the form of a module (which could be a collection of functions or a class). The interfaces can be mocked in tests for the operational logic as required. The interface can be excluded from code coverage reporting for unit tests as appropriate, e.g., if it is sufficiently covered by integration tests. The interfaces should generally be covered by integration tests.

The Discourse class from the gatekeeper action implements this pattern.

Testing container file system interactions

Some workloads require files to exist on the workload container, such as files to be served as a static website or configuration files for the workload. These should be delegated to the relevant module, such as the module for the workload. This means that these modules now require access to the container file system

It is reasonable to pass ops.model.Container to the module for the purpose of:

File system interactions (container.push and container.pull and similar functions)
Executing commands on the workload container (container.exec)

For testing purposes, ops.testing.Harness.model.get_container can be used to get a mocked container and ops.testing.Harness.get_container_filesystem can be used to prepare and inspect the container virtual file system.

def test_app_config(harness, charm_state_params, config_file):
    """
    arrange: create the Gunicorn webserver object with a controlled charm state generated by the
        charm_state_params parameter.
    act: invoke the update_config method of the webserver object.
    assert: configuration file inside the flask app container should change accordingly.
    """
    container_filesystem = harness.get_container_filesystem(CONTAINER_NAME)
    container_filesystem.make_dir(APP_BASE_DIR, make_parents=True)
    container: Container = harness.model.unit.get_container(CONTAINER_NAME)
    harness.set_can_connect(FLASK_CONTAINER_NAME, True)

    app = App(charm_state=CharmState(**charm_state_params), container=container)
    webserver.update_config(is_webserver_running=False)

    assert container_filesystem.pull(f"{APP_BASE_DIR}/conf.py").read() == config_file

And ops.testing.Harness.handle_exec can be used to simulate process executions in the container.

def test_wordpress_install(harness: ops.testing.Harness):
    """
    arrange: simulate that WordPress has not yet been installed in the database.
    act: run the initial set of Juju event hooks.
    assert: verify that the charm has executed the WordPress installation command.
    """
    harness.handle_exec("wordpress", ["wp", "core", "is-installed"], result=1)
    installed = False

    def handle_wp_core_install(args: ops.testing.ExecArgs):
        nonlocal installed
        installed = True

    harness.handle_exec(
        "wordpress",
        ["wp", "core", "install"],
        handler=handle_wp_core_install
    )
    harness.begin_with_initial_hooks()

    assert installed

Event Handler Names

In the __init__ of a CharmBase class, self.framework.observe is used to register functions that handle charm events. There should be a 1-1 mapping between events and observers, where the observer is named _on_<event-name> (for example, _on_config_changed).

Any code that is meant to be called in response to multiple events is invoked only from toplevel observers (event handlers). In other words, you should NOT register the same function as a handler for multiple events.

This has the following benefits:

Removes the need for spurious _event=None arguments in methods because they might be used as event observers.
Simplifies control flow. Either a method is a top-level observer, or it can be called by one.

Only event handlers should be event-aware

The only charm methods and functions that accept EventBase instances as arguments should be the event handlers themselves. In other words, every other method should not be interacting with Events. This means, among other things, that only event handlers (the methods on which Framework.observe is called) can defer events. This helps keep the execution flow simple.

Observer Pattern

Introduction

An observer pattern is a way in which the event handlers of a charm, i.e. _on_relation_joined, _on_pebble_ready are separated by the business logical concern. For example a Database observer for all database-relation related handlers, an Agent observer for Jenkins agent related handlers. Different groupings of business logic for observers can help reduce the number of lines of code for the main charm.py, making it easier to navigate around the codebase.

The differences between a regular charm and observer charm

A regular charm would implement all the event handlers in the main charm.py file.

In the case of the Database handlers and Jenkins agent handlers for example, the main charm.py would result in a big list of handlers registration as well as a large number of handlers.

# charm.py

import ops

class HelloWorldCharm(ops.CharmBase):
    def __init__(*args):
        super().__init__(*args):
            self.framework.observe(self.on["database"]_relation_joined, self._on_database_relation_joined)
            self.framework.observe(self.on["database"]_relation_changed, self._on_database_relation_changed)
            self.framework.observe(self.on["agent"]_relation_joined, self._on_agent_relation_joined)
…

    def _on_database_relation_joined(self, event: ops.RelationJoinedEvent):
        ... 
    def _on_database_relation_changed(self, event: ops.RelationChangedEvent):
        ... 
    def _on_agent_relation_joined(self, event: RelationJoinedEvent):
        ...

Each of the handlers can have unique logic associated with them, making the charm file long and hard to navigate.

On the other hand, by splitting the handlers into different files through the observer pattern, the main charm.py file becomes easy to understand.

# charm.py

import ops

class HelloWorldCharm(ops.CharmBase):
    def __init__(*args):
        super().__init__(*args):
        self.database = DatabaseObserver(self, self.state)
        self.agent = AgentObserver(self, self.state)

And all the handlers are registered within the scope of each observer.

# database.py

import ops
from state import State

class DatabaseObserver(ops.Object):
    def __init__(charm: ops.CharmBase, state: State, pebble_service: PebbleService)
        super().__init__(charm, "database-observer")
        self._pebble_service = pebble_service
        self.framework.observe(self.on["database"]_relation_joined, self._on_database_relation_joined)
        self.framework.observe(self.on["database"]_relation_changed, self._on_database_relation_changed)

    def _on_database_relation_joined(self, event: ops.RelationJoinedEvent):
        # Do things related to database relation joined
        # Delegate to pebble
        self._pebble_service.reconcile()

# agent.py
class AgentObserver(ops.Object):
    def __init__(charm: ops.CharmBase, state: State, pebble_service: PebbleService)
        super().__init__(charm, "agent-observer")
        self._pebble_service = pebble_service
        self.framework.observe(self.on["agent"]_relation_joined, self._on_agent_relation_joined)

    def _on_agent_relation_joined(self, event: RelationJoinedEvent):
        # Do things specific to agent relation joined
        # Delegate to pebble
        self._pebble_service.reconcile()

For charms with many integrations, this pattern can help improve readability greatly.

The Benefits of the Observer Pattern

The main advantage of using the observer pattern is to help split code into manageable bits. The other intended effect for development is to be more surgical about the charm code we write per event handler. Rather than writing a charm with a big reconcile function that takes all and does all, each handler can now target the exact operations that are intended by the event trigger and delegate to pebble only after the unique operation has taken place.

Limits of the Observer Pattern

The observer pattern isn’t quite suitable for charms that do not have clear separation between different logical components of the charm. For example, when writing to a global configuration file, it isn’t quite suitable to try to wrap both observers into a single one.

Class DatabaseObserver:
    def on_database_changed(self, event: ops.RelationChangedEvent):
        write_to_config(self.state.database, self.state.agent)

Class AgentObserver:
    def on_agent_changed(self, event: ops.RelationChangedEvent):
        write_to_config(self.state.database, self.state.agent)

In the example above, both of them write to the same configuration and the configuration depends on both the agent and the database. It is less sensible to differentiate between them.

Use of Enumerations

In some cases a variable can take on a distinct set of values, such as the outcome of an operation could be success, failure and skipped. Python has a few ways this can be implemented, including the enum module and the Literal type. Enumerations should be used because the team has an existing way that descriptions of the options will be enforced by the docstring tooling.

Further Information

Spec on how to use pydantic for the charm state (not publicly available): DA016 - Charmed Structured Configuration
Spec on unit testing best practices (not publicly available): OP032 - Best practices for unit testing
Prototype of scenario-based testing backend for externalized State monolith

ppasotti · 13 October 2023 06:53

Huh, I didn’t know you published this one! Good to see it in the open

Also like the addition of the observer pattern, interesting thought. Will try that out.