What would happen if a charm sleeps for a "long" time

michaeldmitry · 23 August 2024 08:08

The purpose of this document is to observe the behaviour of juju when a charm takes too long to exit its event handling/respond back to juju.

This document will address the following:

What is “too long”?
What happens to a charm’s leadership?
What happens to the charm’s hooks lifecycle?

Context

Often times, a charm might be blocked on a synchronous operation, or have a retry logic on an API call, or, less often, just sleep. And that raises concerns, especially with juju’s leadership election in which a leader unit can be guaranteed that it will have leadership for approximately 30s from the moment the leader-elected event is received.

The same holds with the current LEASE_RENEWAL_PERIOD that is, when the unit obtains its current leadership status through is_leader(), the value is cached for the duration of a lease which is 30s in Juju, that in turn, guarantees that the unit will have leadership for the next 30s (i.e calling is_leader() will basically renew your unit leadership for an additional 30s).

After that time, juju might elect a different leader. So, it seems that that “too long” could be just a maximum of 30 seconds and after that bad things will happen. Right? The below setup tests for that exact scenario.

Test

We’ll use Loki as our test charm which we’ve modified to sleep for 2 mins at the end of every event if this charm unit is the leader unit.

class LokiOperatorCharm(CharmBase):
    def __init__(self, *args):
        ...
        self.framework.observe(self.on.collect_unit_status, self._on_collect_unit_status)

    def _on_collect_unit_status(self, event: CollectStatusEvent):
        if self.unit.is_leader():
            logger.debug("Sleeping for 2 mins")
            time.sleep(2 * 60)

Then, we deployed the charm with 2 units to see how leadership will change. By default, loki/0 unit is the leader unit that will sleep at the end of every event.

After some time:

Although loki/0 unit exceeded the LEASE_RENEWAL_PERIOD, the charm container has not been killed and remained leader for the entire lifecycle of events. You can see the 2-mins delay between each fired event which indicates that neither the leadership has been affected nor the queue of events for the blocking unit.

Conclusions

Although, after the guaranteed 30s of leadership, leadership can change at any time even while a hook is running, juju will elect a new leader if it detects that the leader unit agent is “down”. And in the above case, the agent is not down, the charm is just blocking a hook for some time.

The new leader election will, however, happen in cases where you’d kill the unit agent, for example (i.e juju ssh loki/0 'kill -9 $(pgrep -f "/charm/bin/containeragent unit")'). If the agent remains down for some time (i.e >30s), then juju will move on to another leader.

ppasotti · 23 August 2024 08:22

thought: is it an acceptable pattern to, during long operations, periodically (say every 15 seconds) call ‘is_leader’ to constantly renew the lease?

hmlanigan · 23 August 2024 20:53

Another consideration is that one unit on a machine sleeping during, or taking a very long time to complete a hook, prevents other units on the same machine from executing hooks. There is a machine lock which allows for only 1 hook on 1 unit at any time.

carlcsaposs · 26 August 2024 07:39

One thing to watch out for on Kubernetes: you only get 30 seconds (in total, for all events) to do a graceful shutdown

(Details: Comment #11 : Bug #2035102 : Bugs : Canonical Juju

Example (in-place upgrade): https://github.com/canonical/charm-refresh/blob/main/docs/requirements_and_user_experience.md#what-happens-after-a-juju-application-is-refreshed

Other examples: pod eviction)

So it’s good to avoid creating an event queue so that you have time to process all current events & gracefully shutdown. Also, if any of your events run for longer than 30 seconds & are running when your pod gets a SIGTERM, you may not get a stop event & be able to attempt a graceful shutdown at all

Whether this timeout should be only 30 seconds is another discussion: Bug #2035102 “Hardcoded kubernetes stateful set `terminationGrac...” : Bugs : Canonical Juju

To provide an example of an approach our team takes to avoid long sleeps and the tradeoffs we’re considering, see “Alternatives considered” here: https://github.com/canonical/mysql-router-k8s-operator/pull/190. Example of the unit status we set: https://github.com/canonical/mysql-router-k8s-operator/pull/190/files#diff-64b9a9c714212841406e56a77357c9d4b4ef9b19acc4f7a09a001db160df2870R22

jameinel · 29 August 2024 12:19

As you noted, the actual logic is that the Unit Agent keeps alive the leadership independent of the charm itself executing. Partly this is because the charm won’t be ‘active’ 100% of the time. For example, the common case is that the charm only wakes up once every 5 minutes to respond to an update-status hook. And we don’t want it to be that the leadership changes between those hooks. (In fact, if we just did the 30s logic, then very easily every update-status to each unit would see that nobody else has spoken in the last minute, and would become the leader.)

We have talked about making it more of an expression from the charm that it is alive, but that would also lead to a lot of async code in the charm, to fire off a background task to ping its “I’m still alive” beacon, which even then doesn’t mean that the main loop that is trying to make forward progress is actually still progressing.

The statement about guaranteeing that when a charm checks is_leader that you have at least 30 more seconds is not because is_leader actually triggers a check/update. I didn’t nail down the exact code path, but if you run juju debug-hooks and then call is-leader you can see that it doesn’t trigger an additional ClaimLeadership call outside of the normal 30s update interval.