Unit test charmed operators with hypothesis

Charmed operators are built around the observer pattern, which generally means that any event can fire at any time. For this reason, charm’s event hooks are encouraged to be stateless, in a sense, and idempotent. This “no-guaranteed-order” can make authoring charms a bit tricky: how do you take into account all possible (relevant) options, and even more so, how do you test for them?

Operator framework has a test harness exactly for this purpose, which becomes even more powerful when combined with property-based testing.

Hypothesis is a property-based testing framework that encourages you to think big, come up with bigger truths and make sure your code complies even for examples you don’t immediately think of yourself.

All examples below are taken from the cos-config-k8s operator (WIP) and use Gherkin as follows:

  • Feature and Background: class docstring
  • Scenario: test method docstring
  • GIVEN, WHEN, THEN: test method body comments

Example 1: peer units and leadership

Charmed applications are typically intended to be deployed with multiple units, which means:

  • peer relation-created will fire at some point during startup
  • peer relation-joined/changed may fire (if more than one unit deployed)
  • a unit can be either a leader unit or not

If written individually, a large number of repetitive unit tests would be needed to properly address the list above. Instead, we can use hypothesis to do the repetitive work for us:

import hypothesis.strategies as st
from hypothesis import given

# ...

    @given(st.booleans(), st.integers(1, 5))
    def test_unit_is_blocked_if_no_config_provided(self, is_leader, num_units):

In the example above, hypothesis would auto generate for us various combination of a boolean value (is_leader) and an integer (num_unit). These variables are then used in the test instead of having a separate test for a leader and non leader, then for a single unit and a multiunit deployment, and finally a combinations of the two. If a test fails for a certain combination, hypothesis will print the test case that failed the test (in fact, hypothesis will try to show the simplest test case – a process called shrinking – for which the test fails):

Falsifying example: test_unit_is_blocked_if_no_config_provided(
    is_leader=False,
    num_units=1,
    self=<test_status_vs_config.TestStatusVsConfig testMethod=test_unit_is_blocked_if_no_config_provided>,
)

When using hypothesis with the operator framework harness, there is a caveat to keep in mind: hypothesis expects the test methods to have no side-effects: hypothesis will re-run the same test multiple times without calling tearDown() and setUp() in between. For this reason we need to do some cleanup on our own. Here’s a full example:

class TestStatusVsConfig(unittest.TestCase):
    """Feature: Charm's status should reflect the completeness of the config.

    Background: For the git-sync sidecar to run, a mandatory config option is needed: repo's URL.
    As long as it is missing, the charm should be "Blocked".
    """
    def setUp(self) -> None:
        self.app_name = "cos-configuration-k8s"

    @given(st.booleans(), st.integers(1, 5))
    def test_unit_is_blocked_if_no_config_provided(self, is_leader, num_units):
        """Scenario: Unit is deployed without any user-provided config."""
        # without the try-finally, if any assertion fails, then hypothesis would reenter without
        # the cleanup, carrying forward the units that were previously added
        self.harness = Harness(COSConfigCharm)
        self.peer_rel_id = self.harness.add_relation("replicas", self.app_name)

        try:
            self.assertEqual(self.harness.model.app.planned_units(), 1)

            # GIVEN any number of units present
            for i in range(1, num_units):
                self.harness.add_relation_unit(self.peer_rel_id, f"{self.app_name}/{i}")

            # AND the current unit could be either a leader or not
            self.harness.set_leader(is_leader)

            self.harness.begin_with_initial_hooks()

            # WHEN no config is provided

            # THEN the unit goes into blocked state
            self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)

            # AND pebble plan is empty
            plan = self.harness.get_container_pebble_plan(self.harness.charm._container_name)
            self.assertEqual(plan.to_dict(), {})

        finally:
            # cleanup added units to prep for reentry by hypothesis' strategy
            self.harness.cleanup()

But in the example above we use begin_with_initial_hooks(), in which startup events fire in a fixed order, followed by a sequence of add_relation_unit(). As mentioned earlier, order of events is not guaranteed, which is something that could be useful to test in the following example.

Example 2: randomize startup hooks

begin_with_initial_hooks() already does some shuffling to better simulate a real deployment, but with hypothesis we can take that a step further: after initial deployment (using begin_with_initial_hooks()) we let the “user” deploy additional peers and create relations to some/all the relations listed in metadata.yaml. For this we would need to combine a few strategies:

# Sample a relation name from a list of relation names the 
# charm supports
st_relation_name = st.sampled_from(
    ["prometheus-config", "loki-config", "grafana-dashboards"]
)

# Each such relation will come with 1-4 units
st_num_relation_units = st.integers(1, 4)

# Generate tuples of relation name and number of
# corresponding units
st_relations = st.tuples(
    st_relation_name, st_num_relation_units
)

# Generate a list of relations to create, but make them 
# unique by relation name (only one app per relation name,
# for simplicity)
relations_strategy = st.lists(
    st_relations, min_size=1, max_size=3, unique_by=lambda x: x[0]
)

Now, to the test itself:

class TestRandomHooks(unittest.TestCase):
    """Feature: Without config, charm's status should remain blocked regardless of relations joined.

    Background: For the git-sync sidecar to run, a mandatory config option is needed: repo's URL.
    As long as it is missing, the charm should be "Blocked".
    """

    def setUp(self) -> None:
        self.app_name = "cos-configuration-k8s"

    @given(
        st.booleans(),
        st.integers(1, 5),
        relations_strategy,
    )
    def test_user_adds_units_and_relations_a_while_after_deployment_without_setting_config(
        self, is_leader, num_peers, rel_list: List[Tuple[str, int]]
    ):
        """Scenario: Unit is deployed, and after a while the user adds more relations."""
        # without the try-finally, if any assertion fails, then hypothesis would reenter without
        # the cleanup, carrying forward the units that were previously added, etc.
        self.harness = Harness(COSConfigCharm)
        self.peer_rel_id = self.harness.add_relation("replicas", self.app_name)

        # GIVEN app starts with a single unit (which is the leader)
        self.harness.set_leader(True)

        # AND the usual startup hooks fire
        self.harness.begin_with_initial_hooks()
        
        # AND no config is provided

        try:
            self.assertEqual(self.harness.model.app.planned_units(), 1)

            # WHEN the user adds peer relations
            hooks_to_fire = []
            hooks_to_fire.extend(
                [
                    lambda i=i: self.harness.add_relation_unit(
                        self.peer_rel_id, f"{self.app_name}/{i}"
                    )
                    for i in range(1, num_peers)
                ]
            )

            # AND the unit may change leadership status
            hooks_to_fire.append(lambda: self.harness.set_leader(is_leader))

            # AND the user adds regular relations
            for rel_name, num_remote_units in rel_list:
                rel_id = self.harness.add_relation(rel_name, f"{self.app_name}-app")
                hooks_to_fire.extend(
                    [
                        lambda rel_id=rel_id, rel_name=rel_name, i=i: self.harness.add_relation_unit(
                            rel_id, f"{rel_name}/{i}"
                        )
                        for i in range(num_remote_units)
                    ]
                )

            # AND hooks fire in random order
            random.shuffle(hooks_to_fire)
            for hook in hooks_to_fire:
                hook()

            # THEN the unit stays in blocked state
            self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)

            # AND pebble plan is empty
            plan = self.harness.get_container_pebble_plan(self.harness.charm._container_name)
            self.assertEqual(plan.to_dict(), {})

        finally:
            # cleanup added units to prep for reentry by hypothesis' strategy
            self.harness.cleanup()

With the example above, hypothesis ran 100 (!) variations of the test, just like that.

Summary

Hypothesis makes us think harder and write better tests, and in return it generates a lot of variations for us that improve our code. Although hypothesis is intended for no-side-effects test methods, when used carefully it can be applied to charm unit tests as well.

Additional potential use cases for hypothesis strategies may include:

Strategy Potential use-case
booleans, decimals, text, … Test config options (config.yaml)
dictionaries, from_regex Test charm’s response to relation data
permutations Test charm’s handling of a random sequence of relation-changed events

For more advanced testing, see Stateful testing.

References

1 Like