Lately some of the bugs I’ve been fixing have been difficult to reproduce in a test controller locally, so I’ve been adding wrenches (github.com/juju/juju/wrench) to the code to simulate them. You can see some examples in https://github.com/juju/juju/pull/9691 - wrench.IsActive(“category”, “name”) checks for a file at /var/lib/juju/wrench/category containing name, and then you can return an error or sleep or whatever.
I’m not sure whether I should be leaving them in the code, and I’m wondering what other people think. My reasoning from the comment on the PR was:
I feel like they’re distracting from the actual code flow, and I haven’t been putting tests in for them (since I’m using them as a development tool) which seems a bit risky. Leaving them in commented was a bit of a compromise so a dev wouldn’t have to reinvent the wheel in the future if they want to get into that state, but it’s still clutter.
My initial thinking was to remove them, but that means that my QA steps aren’t repeatable. Is that a problem?
There are a few in the codebase already, but I think they’re ones that are used by tests to hit some particularly tricky error case.
I feel like it depends on an individual case basis. For things that are narrowly focused on a specific bug, then probably you can create that wrench, test it, and be done with it.
But if that wrench is more generally applicable, then I think we could leave it in.
As for ‘reproducing the QA steps’, one option is that rather than defining the wrench in commented out code, you can put the diff into the Pull Request as
apply this patch:
+++
---
+ // add a defined wrench
That would give a nice step for someone wanting to follow along/ if we run into the problem again in the future and want to investigate if the patch really did fix the bug.
But it also doesn’t directly clutter up the code.
Sorry, for the sake of those new to Juju–actually, I’m being purely self-interested and just want to know myself–what is a wrench?
I assume it’s a mechanism designed to intentionally corrupt state. But from looking briefly at the tests, it looks like that it’s just a plain text file that may or may not exist?
My view is that we not leave code intended for developer debugging of issues live in the code on production systems. We would never use a wrench on a production system as they are debugging tools for developers.
But, we know there are some key areas where i/o timeouts or other issues related to running at scale can cause problems and need to be debugged from time to time. So, I think it’s ok to leave a few commented out lines there so it’s trivial to diagnose issues as they arise without reinventing the wheel; the developer just has to uncomment a few lines. They would be adding other trace and debugging anyway as part of investigating any issue.
Actually, ‘wrench’ itself only exists to make it possible to poke a live system. But we recently introduced ‘juju controller-config’ which is way nicer than creating the right file in the filesystem, and is something you can update from the client.