[Feedback] juju resources and upgrade-charm hook

Juju resources and the upgrade-charm event have become a point of contention for us as the number of units in our models have increased.

Problem
The upgrade-charm hook event is dumb in the sense that it is not contextually aware of what is being upgraded. There are a few issues that stem from this that make things really clunky when trying to mitigate resources from a charm author perspective.

In the case of a charm with one or more resources, the charm has no way of knowing which resource is being upgraded, or if it is the charm code itself that is being upgraded.

Imagine a case where a charm wants to preform some operation on the resource only if it has changed, and not blindly when the upgrade event/hook fires (possibly only the charm code is being upgraded and we don’t want to do anything to the resources of the charm).

Per this conversation on upgrading charm vs resources it seems there is a path forward where users are storing and checking the sha of the resource to be able to determine if it has changed.

This makes good sense logically, with the exception that resource-get has to run for each resource, for each unit, in order to get the resource local to the unit to be able to check and compare the sha, possibly needlessly if only the charm code has changed.

Using this method, considering a model of 50 units, the juju controllers get entirely stomped handing out a resource to a bunch of asking units. The next thing that happens after the controllers basically go down is that the environment gets stuck in an unusable and unstable state for quite some time while units wait for their resource from the controllers and for the controllers to resume/recover from the load.

With anywhere from 50-200 primary charm units per application in each of our models, this hiccup has been biting us badly, causing us to entirely re-provision clusters in order to upgrade them – we can’t use the the upgrade hook combined with charm resources because takes everything down.

Ideas for a solution

  • Provide a way to check the sha of a resource without pulling it down to the unit. I think this could be accomplished via:

    • Add context to the upgrade-charm hook that would expose the sha of each resource
    • Create a resource_info() function that could return the info (sha) of the desired resource(s)
  • resource-<resource-name>-changed hook

    • Create resource centric hook events that fire when a resource has changed

Thoughts?

Thank you

2 Likes
  • Ways to manage how the resource is retrieved to avoid controller congestion (or if that can be managed by juju)

I think your proposal for a solution seems legit.

The idea of being able to ask for the current hash of a resource is a great idea.

I think it would also be nice to be able to check the current revision of the resource, as seen when using juju resources app_name. That would be a nice relatively lightweight item to track in app state for comparison.

ops event classes like ResourceCreatedEvent, ResourceUpdatedEvent and ResourceRemovedEvent would definitely be handy too when passing in ResourceMeta with new revision and hash or similar properties included.

@ianmjones @erik-lonroth thanks for the feedback

I left out my final and most important feedback point above :man_facepalming:, I’ll state it here.

Problem
The greatest issue that we face with how resources are handled is how upgrading a resource forces all units of an application to execute the upgrade-charm hook at the same time.

Once the clunky bits above get smoothed out with access to the sha/rev, and charm authors can be more nimble about how they check for and provision resources, we are still left with the issue of all units upgrading at the same time.

Suggestion
Can we chose a point at which juju decides to stagger the delivering of the resources to the units?

  • Could juju do some simple math, something like …

    on resource_attach
      stagger_upgrade = False
      if the number of units are greater then some reasonable number
          stagger_upgrade = True

Hi All,

Thank you for the feedback. We’re aware of the issue, and fixing it is a priority for us – probably in a 2.9.1/2.8.8 early in 2021. We’ll drop updates in the bug tracking it here: Bug #1905703 “juju 2.8.6 - resource download very slow” : Bugs : juju

~ PeteVG

4 Likes