Making charms more SOLID

jose · 20 December 2024 18:07

Table of Contents

Initial situation
Analysing the charm through SOLID Principles
Making it more SOLID
Final notes on the improvements brought about by these changes

When I am asked what a Charm is, I like to explain it synthetically as:

“It is software that manages the life cycle of other software. It can do this thanks to Juju which is its orchestrator.”

It is obviously a simplification of the real definition:

A charm is software that wraps an application and that contains all of the instructions necessary for deploying, configuring, operating an application on any cloud using Juju.

A charm generally contains the operations (charm) code and the information on where to get the application itself (the workload). […]

Charms include deterministic logic that specifies what happens when specific events occur. Events can be triggered by an administrator (e.g. through the CLI), by other charms or the external environment.

But as a Charm author the term charm not only refers to the package that can be installed running juju deploy my-charm, it also refers to the file (src/charm.py) in which we place the class that represents the charm itself and the entry point of the charm execution.

A common mistake that Software Engineers make when we start writing charms is to confuse the real definition of charm which refers to the software package, with the src/charm.py file. This way we start writing all the logic in that file. Although the charm will work well anyway, it will then be more complex to maintain, understand and extend it.

Would it be possible to improve the readability, maintainability while decoupling the code? Well, the answer is YES!

So, let’s tray to make our charms more SOLID

Initial situation

Some time ago I discovered snips.sh a really cool SSH-powered pastebin with a human-friendly TUI and web UI, and I decided to write a charm for it. I use it as a toy project and it was never the idea to make it production-ready, but who knows

Let’s start by taking a look of the src/charm.py file:

import logging
import secrets
import string
from typing import Dict

from ops import PebbleReadyEvent
from ops.charm import CharmBase
from ops.main import main
from ops.model import (
    ActiveStatus,
    BlockedStatus,
    MaintenanceStatus,
    SecretNotFoundError,
)
from ops.pebble import ChangeError, Layer

# Log messages can be retrieved using juju debug-log
logger = logging.getLogger(__name__)


CONTAINER_NAME = "snips"
HTTP_PORT = 8080
SSH_PORT = 2222
LAYER_NAME = CONTAINER_NAME
SERVICE_NAME = CONTAINER_NAME


class SnipsK8SOperatorCharm(CharmBase):
    """Charm the service."""

    def __init__(self, *args):
        super().__init__(*args)
        self._container = self.unit.get_container(CONTAINER_NAME)
        self.framework.observe(self.on.snips_pebble_ready, self._on_snips_pebble_ready)
        self.framework.observe(self.on.config_changed, self._on_config_changed)
        self.framework.observe(self.on.update_status, self._on_update_status)

    def _on_snips_pebble_ready(self, _: PebbleReadyEvent):
        self._common_exit_hook()

    def _on_config_changed(self, _):
        self._common_exit_hook()

    def _on_update_status(self, _):
        self._common_exit_hook()

    def _common_exit_hook(self) -> None:
        """Event processing hook that is common to all events to ensure idempotency."""
        if not self._container.can_connect():
            self.unit.status = MaintenanceStatus("Waiting for pod startup to complete")
            return

        # Update pebble layer
        if not self._update_layer():
            self.unit.status = BlockedStatus("Failed to update pebble layer. Check juju debug-log")
            return

        self.unit.status = ActiveStatus()

    def _update_layer(self) -> bool:
        """Update service layer.

        Returns:
          True if anything changed; False otherwise
        """
        overlay = self.pebble_layer
        plan = self._container.get_plan()

        if SERVICE_NAME not in plan.services or overlay.services != plan.services:
            self._container.add_layer(LAYER_NAME, overlay, combine=True)
            try:
                self._container.replan()
                return True
            except ChangeError as e:
                logger.error(
                    "Failed to replan; pebble plan: %s; %s",
                    self._container.get_plan().to_dict(),
                    str(e),
                )
                return False

        return True

    @property
    def _hmac_key(self) -> str:
        try:
            secret = self.model.get_secret(label="hmac-key")
        except SecretNotFoundError:
            secret = self.app.add_secret({"hmac-key": self._generate_hmac_key()}, label="hmac-key")

        return secret.get_content()["hmac-key"]

    def _generate_hmac_key(self) -> str:
        """Generate a random 24 character symmetric key used to sign URLs."""
        chars = string.ascii_letters + string.digits
        return "".join(secrets.choice(chars) for _ in range(24))

    @property
    def pebble_layer(self) -> Layer:
        """Pebble layer for the snips service."""
        return Layer(
            {
                "summary": "snips layer",
                "description": "pebble config layer for snips",
                "services": {
                    "snips": {
                        "override": "replace",
                        "summary": "snips",
                        "command": "/usr/bin/snips.sh",
                        "startup": "enabled",
                        "environment": self._env_vars,
                    }
                },
            }
        )

    @property
    def _env_vars(self) -> Dict:
        env_vars = {
            "SNIPS_DEBUG": True,
            "SNIPS_HMACKEY": self._hmac_key,
        }
        return env_vars


if __name__ == "__main__":  # pragma: nocover
    main(SnipsK8SOperatorCharm)

This is an easy to read, short file with only 141 lines of code. It couldn’t be better, could it? Well no, in fact that code could be much better, let’s analyse it!

Analysing the charm through SOLID Principles

Single Responsibility Principle (SRP)

The Single Responsibility Principle states that a module, a class, or even a method should have a single, well-defined responsibility. It should do just one thing and have only one reason to change.

If we carefully analyse the SnipsK8SOperatorCharm class we notice that it has multiple responsibilities:

Handle events (snips_pebble_ready, config_changed and update_status).
Manage secret keys (_hmac_key, _generate_hmac_key).
Configure and generate Pebble layer (pebble_layer, _env_vars).

So the problem here is crystal clear : The class is taking on too many responsibilities, which means it has multiple reasons to change.

When we think about a charm, and specially the class that represents it in src/charm.py file, what would be that one thing the class should be responsible for?

One answer to this question could be:

Probably the only thing a charm class should do is to observe (and react to) events. All other tasks can be performed by ad-hoc classes.

Open/Closed Principle (OCP)

If we now analyse the class from the OCP point of view, we can find some issues:

The _common_exit_hook method contains specific logic to check if it is possible to connect to the container and update the Pebble plan.
The pebble_layer property is directly implemented inside the charm, with no possibility to easily extend it for different configurations or services.

The problem here is that if we want to extend the behaviour of the charm or change the Pebble layer, we will need to modify the existing code instead of extending it.

Liskov Substitution Principle (LSP):

We do not have an explicit violation in this code because SnipsK8SOperatorCharm is not used in a polymorphic context and there is no subtype substitution involved.

Interface Segregation Principle (ISP):

Well, we don’t seem to violate this principle either!

Dependency Inversion Principle (DIP):

SnipsK8SOperatorCharm depends directly on concrete implementations such as secrets.choice, and ops.pebble.Layer. We do not have abstractions to handle secret keys or Pebble configurations. This makes it difficult to replace these dependencies.

Making it more SOLID

My strategy here is to split the code in src/charm.py into 5 different modules with the following classes. You can check the code in this branch.

charm
- SnipsK8SOperatorCharm: Observes and react to the events.
components_init
- ComponentsInitialiser: Initialise the components needed by SnipsK8SOperatorCharm
snips
- Snips: Configure the Pebble layer for the snips workload
secret_manager
- SecretManager: Manages the secret key.
tasks
- Task: Just an interface for the tasks that the charm will execute
- ValidateCanConnectTask: Validates if it is possible to connect to the workload container
- UpdatePebbleLayerTask: Updates Pebble layer
- TasksFactory: A class factory that generates a list of the tasks that need to be executed by the charm.

Now, let’s take a look a those modules!

`charm`

source code here

import logging
from typing import List

from ops import main
from ops.charm import CharmBase
from ops.model import ActiveStatus

from components_init import ComponentsInitialiser
from snips import CONTAINER_NAME
from tasks import Task, TasksFactory

# Log messages can be retrieved using juju debug-log
logger = logging.getLogger(__name__)


class SnipsK8SOperatorCharm(CharmBase):
    """Snips charm class."""

    def __init__(self, framework):
        super().__init__(framework)
        initialiser = ComponentsInitialiser(self, CONTAINER_NAME)
        self._container, self._snips = initialiser.initialise_components()
        self._tasks: List[Task] = TasksFactory(self).tasks
        self._observe_events()

    def _observe_events(self):
        self.framework.observe(self.on.snips_pebble_ready, self._reconcile)
        self.framework.observe(self.on.update_status, self._reconcile)
        self.framework.observe(self.on.config_changed, self._reconcile)

    def _reconcile(self, _) -> None:
        """Event processing hook that is common to all events to ensure idempotency."""
        for task in self._tasks:
            if not task.execute():
                return
        self.unit.status = ActiveStatus()


if __name__ == "__main__":
    main(SnipsK8SOperatorCharm)

Woow! now we have 21 lines class!! Much easier to understand than the previous version!

components_init

source code here

from secret_manager import SecretManager
from snips import Snips


class ComponentsInitialiser:
    """Initialise the components for the charm."""

    def __init__(self, charm, container_name):
        self.charm = charm
        self.container_name = container_name

    def initialise_components(self):
        """Initialise the components for the charm."""
        container = self.charm.unit.get_container(self.container_name)
        secret_manager = SecretManager(self.charm)
        snips = Snips(container, secret_manager.hmac_key)
        return container, snips

snips

source code here

import logging
from typing import Dict

from ops import Container
from ops.pebble import ChangeError, Layer

# Log messages can be retrieved using juju debug-log
logger = logging.getLogger(__name__)

CONTAINER_NAME = "snips"
LAYER_NAME = CONTAINER_NAME
SERVICE_NAME = CONTAINER_NAME
HTTP_PORT = 8080
SSH_PORT = 2222


class Snips:
    """Snips workload container facade."""

    def __init__(self, container: Container, hmac_key: str):
        self._container = container
        self._hmac_key = hmac_key

    @property
    def pebble_layer(self) -> Layer:
        """Pebble layer for the snips service."""
        return Layer(
            {
                "summary": "snips layer",
                "description": "pebble config layer for snips",
                "services": {
                    "snips": {
                        "override": "replace",
                        "summary": "snips",
                        "command": "/usr/bin/snips.sh",
                        "startup": "enabled",
                        "environment": self._env_vars,
                    }
                },
            }
        )

    @property
    def _env_vars(self) -> Dict:
        env_vars = {
            "SNIPS_DEBUG": True,
            "SNIPS_HMACKEY": self._hmac_key,
        }

        return env_vars

    def update_layer(self) -> bool:
        """Update service layer.

        Returns:
          True if anything changed; False otherwise
        """
        overlay = self.pebble_layer
        plan = self._container.get_plan()

        if SERVICE_NAME not in plan.services or overlay.services != plan.services:
            self._container.add_layer(LAYER_NAME, overlay, combine=True)
            try:
                self._container.replan()
                return True
            except ChangeError as e:
                logger.error(
                    "Failed to replan; pebble plan: %s; %s",
                    self._container.get_plan().to_dict(),
                    str(e),
                )
                return False

        return True

secret_manager

source code here

import secrets
import string

from ops.model import SecretNotFoundError


class SecretManager:
    """Secret Manager class."""

    def __init__(self, charm):
        self.charm = charm

    @property
    def hmac_key(self) -> str:
        """Retrieve or create hmac-key."""
        try:
            secret = self.charm.model.get_secret(label="hmac-key")
        except SecretNotFoundError:
            secret = self.charm.app.add_secret(
                {"hmac-key": self._generate_hmac_key()}, label="hmac-key"
            )
        return secret.get_content()["hmac-key"]

    def _generate_hmac_key(self) -> str:
        """Generate a random 24 character symmetric key used to sign URLs."""
        chars = string.ascii_letters + string.digits
        return "".join(secrets.choice(chars) for _ in range(24))

tasks

source code here

import logging
from typing import List, Protocol

from ops.model import (
    BlockedStatus,
    MaintenanceStatus,
)

# Log messages can be retrieved using juju debug-log
logger = logging.getLogger(__name__)


TASK_EXECUTED = "Task '%s' executed."
TASK_FAILED = "Task '%s' failed: %s"


class Task(Protocol):
    """Task interface."""

    def execute(self) -> bool:
        """Task execution.

        Return True if the execution is successful otherwise False.
        """
        ...


class TasksFactory:
    """TaskFactory class."""

    def __init__(self, charm):
        self._charm = charm

    @property
    def tasks(self) -> List[Task]:
        """List of tasks."""
        return [
            ValidateCanConnectTask(self, self._charm._container),
            UpdatePebbleLayerTask(self, self._charm._snips),
        ]


class ValidateCanConnectTask:
    """Validate Can Connect task class."""

    def __init__(self, charm, container):
        self._class_name = __class__.__name__
        self.charm = charm
        self._container = container

    def execute(self) -> bool:
        """Task execution."""
        if not self._container.can_connect():
            msg = "Waiting for pod startup to complete"
            self.charm.unit.status = MaintenanceStatus(msg)
            logger.debug(TASK_FAILED, self._class_name, msg)
            return False

        logger.debug(TASK_EXECUTED, self._class_name)
        return True


class UpdatePebbleLayerTask:
    """Update Pebble Layer task class."""

    def __init__(self, charm, snips):
        self._class_name = __class__.__name__
        self._charm = charm
        self._snips = snips

    def execute(self) -> bool:
        """Task execution."""
        if not self._snips.update_layer():
            msg = "Unable to update Pebble layer. Check juju debug-log"
            self._charm.unit.status = BlockedStatus(msg)
            logger.debug(TASK_FAILED, self._class_name, msg)
            return False

        logger.debug(TASK_EXECUTED, self._class_name)
        return True

Final notes on the improvements brought about by these changes

Single Responsibility Principle (SRP)

Improvement: The initialisation of components and creation of tasks has been delegated:
- ComponentsInitialiser encapsulates the logic for initialising components (_container, _snips).
- TasksFactory is responsible for task creation (Task), removing this responsibility from the charm.
Outcome: The SnipsK8SOperatorCharm class now has a single responsibility: observe (and react to) events. This improves cohesion.

Open/Closed Principle (OCP)

Improvement: The task creation logic is abstracted in TasksFactory, allowing new tasks to be added without modifying the charm.
Outcome: The charm is open to extension (e.g., adding new tasks or components) without requiring changes to its base code.

Interface Segregation Principle (ISP)

Improvement:
- The charm uses TasksFactory and ComponentsInitialiser, which encapsulate details of underlying dependencies such as the container and Pebble.
- This ensures that the charm only interacts with interfaces relevant to its purpose (event and task management).
Outcome: Exposure to unnecessary external dependencies is reduced.

Dependency Inversion Principle (DIP)

Improvement:
The initialisation of components and tasks is delegated to ComponentsInitialiser and TasksFactory, which abstract concrete dependencies.
This ensures that the charm depends on abstractions (such as lists of tasks and initialised containers) instead of concrete details.
Outcome: The code is more modular and easier to maintain or test.

I believe this approach demonstrates that applying SOLID principles to charms results in clean, modular, and maintainable codebases. While the initial charm was functional, this refactor ensures that it is sustainable and scalable in the long term. Making charms more SOLID

ca-scribner · 7 January 2025 17:02

+1 to everything @jose. completely agree that many charms break the SOLID principles, and I don’t fault people for it. Although its possible to write SOLID charms like you show, imo the framework (inadvertently, I think) nudges you away from that.

Where this really gets costly imo is the points you mention and also code reuse. Having these non-SOLID charms makes it much harder to see and extract the shared logic into reusable packages.

I (/Kubeflow team) went on a deep dive into this back in 2023. This post on reducing duplication proposed something really similar to your Task/TaskFactory concept. We have a common reconciler (CharmReconciler, kinda equivalent to the TaskFactory here) and then wrap each bit of charm logic in a separate Component (equivalent to your Tasks, but with a bit more API). imo, it leads to really easy to read charm code, but also nicely separated responsibilities:

example from kubeflow-volumes

class KubeflowVolumesOperator(CharmBase):
    """Charm for the Kubeflow Volumes Web App.


    https://github.com/canonical/kubeflow-volumes-operator
    """


    def __init__(self, *args):
        super().__init__(*args)


        # add links in kubeflow-dashboard sidebar
        self.kubeflow_dashboard_sidebar = KubeflowDashboardLinksRequirer(
            charm=self,
            relation_name="dashboard-links",
            dashboard_links=DASHBOARD_LINKS,
        )


        # expose web app's port
        http_port = ServicePort(int(self.model.config["port"]), name="http")
        self.service_patcher = KubernetesServicePatch(
            self, [http_port], service_name=f"{self.model.app.name}"
        )


        # Charm logic
        self.charm_reconciler = CharmReconciler(self)
        self.leadership_gate = self.charm_reconciler.add(
            component=LeadershipGateComponent(
                charm=self,
                name="leadership-gate",
            ),
            depends_on=[],
        )


        self.kubernetes_resources = self.charm_reconciler.add(
            component=KubernetesComponent(
                charm=self,
                name="kubernetes:auth",
                resource_templates=K8S_RESOURCE_FILES,
                krh_resource_types={ClusterRole, ClusterRoleBinding, ServiceAccount},
                krh_labels=create_charm_default_labels(
                    self.app.name, self.model.name, scope="auth"
                ),
                context_callable=lambda: {"app_name": self.app.name, "namespace": self.model.name},
                lightkube_client=lightkube.Client(),
            ),
            depends_on=[self.leadership_gate],
        )


        self.ingress_relation = self.charm_reconciler.add(
            component=SdiRelationBroadcasterComponent(
                charm=self,
                name="relation:ingress",
                relation_name="ingress",
                data_to_send={
                    "prefix": "/volumes",
                    "rewrite": "/",
                    "service": self.model.app.name,
                    "port": int(self.model.config["port"]),
                },
            ),
            depends_on=[self.leadership_gate],
        )


        self.kubeflow_volumes_container = self.charm_reconciler.add(
            component=KubeflowVolumesPebbleService(
                charm=self,
                name="container:kubeflow-volumes",
                container_name="kubeflow-volumes",
                service_name="kubeflow-volumes",
                files_to_push=[
                    ContainerFileTemplate(
                        source_template_path=CONFIG_YAML_TEMPLATE_FILE,
                        destination_path=CONFIG_YAML_DESTINATION_PATH,
                    ),
                ],
                inputs_getter=lambda: KubeflowVolumesInputs(
                    APP_SECURE_COOKIES=self.model.config["secure-cookies"],
                    BACKEND_MODE=self.model.config["backend-mode"],
                    VOLUME_VIEWER_IMAGE=self.model.config["volume-viewer-image"],
                ),
            ),
            depends_on=[
                self.leadership_gate,
                self.kubernetes_resources,
            ],
        )


        self.charm_reconciler.install_default_event_handlers()
        self._logging = LogForwarder(charm=self)

There’s several (maybe 10 ish?) kubeflow charms that have used this common CharmReconciler since late 2023. The Sunbeam charms do things much the same way.

benhoyt · 9 January 2025 04:35

This is a good discussion to have. However, I do want to push back a bit on this. In particular:

Woow! now we have 21 lines class!! Much easier to understand than the previous version!

The problem is, those 21 lines are calls to other code that’s much more generic and abstracted than the original version, which you then have to go understand to figure out how the charm works. I find the simple 1-class version easier to understand than the 8-class version (4 classes for tasks alone). See also Jack Diederich’s great talk Stop writing classes – I think TaskFactory could just be a list and Task just a function. In addition, TaskFactory looks generic, but with the tasks method actually is specific to this charm. It’s possible this kind of abstraction pays off at a certain charm size, but here it looks like unnecessary wiring. Is “SOLID” leading us astray, here at least?

I do think some form of separation is good. In particular, separating the application-specific code (in this case snips.sh-specific stuff) from the charm-specific code is something we’ve talked about a bunch.

I realise I’m being quite frank and even wet-blankety here. I’ve given some strong push-back but am very open to strong push-back too. Let’s discuss!

jose · 9 January 2025 16:32

Hi Ben!!

It is great to have your thoughts on this!

After a few years of writing charms, I believe there is a lot of room for improvement in the way we do it.

I took the example of the snips charm as a guinea pig because it is a toy charm, and it allows me to explore alternatives easily, not because I think we should use this approach in charms of that size.

In our team we have charms in which the main class has more than 1000 lines of code () and dozens of methods, where it is already complicated to understand what that class does. From my point of view in this context, the sentence:

Woow! now we have 21 lines class!! Much easier to understand than the previous version!

Takes on a totally different dimension, especially in terms of maintainability, robustness and coupling.

We have a fundamental agreement here: Separation of concerns is good.

The question is where is it good to draw the line? In my initial post, I ventured to say:

Probably the only thing a charm class should do is to observe (and react to) events. All other tasks can be performed by ad-hoc classes.