Sometimes we’re faced with a charm code problem in production: we’re not sure how we got there, but something is wrong. For example:
- Forgot to observe a hook.
- An unexpected code path resulted in incomplete rendering of relation data.
- Had to
juju resolve --no-retry
, but as a result missed out on some key logic. - Stored state got out of sync with relation data and with config files on disk.
In such cases we often do a variation of the following:
- Re-relate one or more relations.
- Refresh the charm from latest/edge.
jhack fire config-changed
- Temporarily change update-status interval to 10 seconds.
- SSH into the charm to manually delete/update something.
Instead, we could try to practice something different:
- Have a juju action,
reconcile
, which calls areconcile
function in charm code. - Have a reconciler function in your charm, that is able to “recreate the world” holistically, at any point in time.
- Introduce some observers to benefit the efficiency of “delta charming”, but those would call the same logic already present in the reconciler.
This way idempotency would be more central to the dev process, delta charming would be more mindful and production issues could have a silver bullet.