Do your charm unit and integration tests cover the whole event space?
In unittests we typically talk of ‘code coverage’ as an indicator of how good your tests are: are all codepaths exercised by your test suite, or are you leaving some untested blind spots that risk being executed in production and reveal some unexpected bug?
Operator framework charms use the observer pattern, where incoming juju events trigger registered charm methods to execute workload operation logic. Juju events can come in at (almost) any time and in (almost) every order, which means it is very important to verify that the charm code can tolerate different sequences of events. So not only it’s important that tests exercise all code, but also that tests exercise all possible ways of executing the same pieces of code in different orders.
It’s quite the combinatorial puzzle to figure out how to do this efficiently, and before we can solve this formally we need a well-defined concept of state and state transition (giving it a shot in scenario
), so we can write the state machine down that we can solve using logic programming tools.
That, or we can brute-force our way out of it, but where’s the fun in that.
As a start, to see where we stand, I propose the creation of an ‘event space coverage’ calculation tool that, not unlike the existing code coverage tools, would output some numbers to tell you how good your test suite is.
A basic metric would be: are all possible events your charm could receive, emitted by the test suite?
A second, more complex metric could be: of all possible (finite) event sequences your charm can see, what are triggered by your test code?
How to express this second metric, how to calculate it, that’s what I’ll be keeping the back of my head busy with the next couple of days. If anyone has thoughts on the matter, please drop a line!