Practicing "loose BDD" in charm unit tests

sed-i · 30 August 2022 03:24

Behavior-driven development (BDD) emerged from test-driven development (TDD) and is quite popular (python: pytest-bdd, behave; javascript: cucumber, jest; go: ginkgo; rust: cucumber; C++: Catch2; C#: SpecFlow; …).

BDD means writing tests in the form of a grokable story:

GIVEN some initial state
WHEN something particular happens
THEN there is a concrete measurable outcome

When you phrase your tests in the form of grokable stories, they tend to turn out short, succinct, readable and kind to your future-self.

Typically, when you go all-in with BDD, you end up creating plain-text *.feature files in Gherkin language that describe behavior, which you then tightly couple with matching decorators. For example:

Feature: showing off behave

  Scenario: run a simple test
     Given we have behave installed
      When we implement a test
      Then behave will test it for us!

from behave import *

@given('we have behave installed')
def step_impl(context):
    pass

@when('we implement a test')
def step_impl(context):
    assert True is not False

@then('behave will test it for us!')
def step_impl(context):
    assert context.failed is False

This could be tedious for several reasons:

Keeping the *.feature files in line with the test files is duplication of effort.
Operator framework (OF) tests using harness involve a mutable state, which has implications on the structure of tests. This may conflict with a BDD framework’s constructs.
The scenario - given - when - then hierarchy imposed by decorators is too strict and will not fit the need of all tests.
“Bending” our tests just to fit a framework’s constructs is probably bad practice.
Tests are not transferable from one framework to another (i.e. would need to put effort into migrating from e.g. behave to pytest-bdd).
Adding another tool for the entire team to learn and master means friction.

Introducing loose-BDD

Going all-in with BDD means friction, so as an alternative I recently started practicing “loose BDD” using Gherkin comments. For example:

def test_config_option_overrides_fqdn(self):
    """The config option for external url must override all other external urls."""
    # GIVEN a charm with the fqdn as its external URL
    self.assertEqual(self.get_url_cli_arg(), self.fqdn_url)
    self.assertTrue(self.is_service_running())

    # WHEN the web_external_url config option is set
    external_url = "http://foo.bar:8080/path/to/alertmanager"
    self.harness.update_config({"web_external_url": external_url})

    # THEN it is used as the cli arg instead of the fqdn
    self.assertEqual(self.get_url_cli_arg(), external_url)
    self.assertTrue(self.is_service_running())

    # WHEN the web_external_url config option is cleared
    self.harness.update_config(unset=["web_external_url"])

    # THEN the cli arg is reverted to the fqdn
    self.assertEqual(self.get_url_cli_arg(), self.fqdn_url)
    self.assertTrue(self.is_service_running())

Advantages

This has several advantages:

No new dependencies and no need to learn new tools.
You retain maximum flexibility in structuring the tests. For example:
- “Scenario” could be a module, a test class or even a test method.
- You can do nested when - then if it fits your purpose.
- etc.
Reviewers can clearly see your intent. Mismatches between expected and actual behavior are easier to point out during code review or otherwise.
All you need to do to understand the intent of the vast majority of tests is to read three lines of comments: given, when, then.

Disadvantages

However, loose Gherkin comments also have some disadvantages:

The comments do not show up in error messages. But frankly, to debug test failures we would open the test anyway, and IDEs let you navigate to the line of failure in a matter of a click… where you will find the illuminating Gherkin comments.
Need to foster a culture of tending to test comments. Ideally, tests are self-documenting, but in many cases with charm tests, they are not easy to understand. Gherkin comments help. A lot.

Conclusion

Considering the above, I currently believe that loose Gherkin comments are the better tradeoff.

pengale · 30 August 2022 17:22

I like the idea of doing loose BDD testing.

Purely selfishly, it matches the way that I already write tests, just with a little bit more formal syntax to the comments.

I’d be happy to adopt in my projects. We could also add it to the example tests in our docs.

ppasotti · 2 September 2022 08:49

Interesting thought! Never heard of BDD before

I’m especially a fan of the ‘given’ part, I think that will work very well with something I have in the cooking, some sort of ‘state snapshotting’… I was thinking that it would go well with something I dubbed ‘scenario-based testing’ which is in fact very much like what you’re going for here:

given [State(relations, config, storedstate, leadership, <workload state>)]:
when [Event(params)]:
then [NewState(relations, config, storedstate, leadership, <workload state>)]

sed-i · 2 September 2022 13:53

This sounds very exciting @ppasotti!