Context
At present, when performing an upgrade of Ubuntu underneath Juju agents, it is possible for application leadership to change while unit agents are stopped for preparation.
There is a requirement from the field that this does not happen - effectively a request to “freeze” application leadership for the duration of such upgrades.
Approach
It has been suggested that we only need to consider acting in this capacity when there are leader units on a particular machine being upgraded. However an operator may wish work outside the scope of a single machine upgrade; for example, preventing leadership elections while all non-leaders are upgraded, so as to avoid handing leadership up or down versions.
With this in mind it has been suggested that we expose such a facility as a separate client command in order to ensure that operators can exercise the control they need over such changes.
During discussions of this requirement, consensus has been arrived at for leadership logic not to be modified with code particular to series upgrade concerns; rather that a general lease “pinning” feature be introduced, which series upgrade and future features might recruit.
Design
The immediate focus will be on the general lease pinning facility, not on a new client command.
The leadership layer above the lease implementations does not present an API that gives enough control to do pinning. It is in the lease layer itself that this logic needs to reside. There are currently two lease implementations:
- State-based (legacy)
- Raft
In the develop branch of Juju, Raft is used by default for leases. There is a feature-flag that allows falling back to the legacy state-based lease management.
Possible Method: Pin by Invoking a Long Lease Extension
Under this method, the call to pin the current leader for an application would involve extending the current leader’s lease by some long period, say 10 years. Unpinning would involve setting the expiry back to a standard duration.
Possible Method: Pin by Storing Information in State
This method would involve creating a new collection into which pinned leader units are written and from which they are later removed. It would be consulted and used to circumvent expiry workflow. This presents challenges in the Raft implementation which does not use a connection to MongoDB.
Open Questions
-
Should we be introduce the pinning capability to the legacy lease logic in addition to the Raft implementation?
-
What will be the mechanism for pinning?