Splitting up N things across K instances

juliank · 8 October 2021 16:11

Hi,

so for autopkgtest-cloud we want to generate N systemd units overall, and want to split that up in the reactive charm such that it is evenly distributed across all instances.

The problem is dealing with overlap, if there are 3 things and 2 instances, one will have to get 2, the other 1. Should we use the leader for this? It seems like not the best choice, the leader can change between invocations and that would be suboptimal.

Is there another way to figure out which instance we are, whether we are the last added instance, for example? Then we could just say that the last instance deals with any overlap, for example.

Now actually, N is not a fixed number, but rather we have one N per architecture that Ubuntu supports. So let’s say we have 3 units each for amd64 and ppc64el; it might make sense to split them up such that instance W1 gets 2 amd64 and 1 ppc64el, and instance W2 gets 1 amd64 and 2 ppc64el; but that might be harder.

jameinel · 13 October 2021 14:25

You could use the leader to coordinate the overlap vs putting the units on the leader. There is also very much an open question as to what you want to happen as you scale up/down the number of units. (If someone does ‘juju add-unit’ does that rebalance the N systemd units or does it wait for you to change N). If you want a system that doesn’t require coordination, I would just use something like your unit number and mod. You can use either a peer relation to find out how many units are currently active, or tools to find the expected number of units (goal-state exposes a bit too much, but you can use it to at least count the number of expected units of your application).

I don’t quite understand this part. How is 1 instance able to do both amd64 and ppc64el. Isn’t the architecture something that is part of the machine where systemd is running? (so if you have 1 amd64 machine and 1 ppc64el and you want 3 units of amd64 and 3 units of ppc64el doesn’t that have to be broken down as 3 amd64 on the amd64 machine and 3 ppc64el on the ppc64el machine?)

I think it is still cleanest if you use an is_leader check and have that unit look in a peer relation and then tell all the units what they should run. If you need to know their architecture, etc, then they can share that on the relation, and then the individual units just look at the buckets and say “ok, I’m unit-5, and the leader has told me to run X and Y”, and then you can centralize that logic (which lets you do simple things like random() and iterating lists without them needing to be deterministic across instances)