A generic regression test for charm upgrades

Recently we added a very simple regression test to the prometheus charm:

  1. Deploy the edge/beta/candidate/stable charm from charmhub.
  2. Attempt to refresh it with a charm built from source.
  3. Wait for active/idle.

The test is short thanks to:

  • pytest.mark.parametrize
  • A fixture (prometheus_charm) that pre-builds the charm – the usual ops_test.build_charm(".")
  • Very narrow scope: a successful refresh into a new workload’s filesystem and the previous StoredState + peer data context
  • (To increase the context to also include regular relation data, you can try to add a few relations)

A slightly massaged version of the test:

@pytest.mark.parametrize("channel", ("edge", "beta", "candidate", "stable"))
async def test_regression(ops_test, prometheus_charm, channel):
    logger.info("Deploy charm from %s channel", channel)
    app_name = f"prom-{channel}"
    await ops_test.model.deploy(
        "ch:prometheus-k8s",
        application_name=app_name,
        channel=channel,
        trust=True,
    )
    await ops_test.model.wait_for_idle(status="active")

    logger.info("Upgrade %s charm to current charm", channel)
    await ops_test.model.applications[app_name].refresh(
        path=prometheus_charm,
        resources=prometheus_resources,
    )
    await ops_test.model.wait_for_idle(status="active")

Happy testing!

1 Like

I like the intuition, simple and clean, but it makes me wonder: is it enough to check that the charm itself deploys without issues and goes to active/idle? What if some related charm errors out? Should that not be part of the test? For example if the charm being tested starts putting rubbish in the databags. I’d argue that for this to qualify as a regression test, all of the ‘outputs’ of the charm being tested should be verified for consistency and validity. Chiefly relation data.

What about:

  • before the upgrade, get a snapshot of all relation data on the charm’s side
  • after the upgrade, verify that the relation data is unchanged (or either way, “still valid”)

However, we might also say: this kind of checks should be done elsewhere. The upgraded charm’s CI should be verifying the databag integrity in a separate pipeline.

When we do not provide the apps arg to wait_for_idle then it waits for all apps. Without raise_on_error=False, it will raise on any charm error.

all of the ‘outputs’ of the charm being tested should be verified for consistency and validity

Agreed. And with prometheus it’s slightly easier than with most charms because it can be related to itself. But generally speaking, the test could be rewritten in the context of a bundle + peripherals. When we go all-in for bundle tests then the bundle would be a natural place for this test.