Initial ideas
The main driver behind creating the model cache was to reduce read load on mongo. Given that the size of the models really isn’t that big, the idea of having the model cached in memory and the information being accessible from that seemed like an appealing idea.
Secondly there is a desire to create a business logic tier in between the apiserver and the state layers. The idea being that we move business logic down from the apiserver so the apiserver is all about just exposing this information over the API, and up out of the state package, so the state package becomes purely a persistence mechanism. The idea here being that we could provide a fake persistence layer to test the business logic and have the tests run exceedingly fast.
However getting there from here is a bit of a problem.
First trial (success) - model config
One of the problems with the coarse grained model-config watchers that the agents use is that any change to any part of the model config causes the workers to wake up. The workers then ask for the model-config only to find that the bit they cared about wasn’t changed, so they go back to sleep. However many workers depend on model-config, and in a deployment where there are, say, 1000 units, changing a configuration value will cause in the range of perhaps 10k wake ups and requests for model-config. Every one of those read requests hits the database.
What we did was to create a model config watcher where the caller could specify the keys that they were interested in. Workers that used this watcher could safely also get the configuration from the cache.
Second trial (issues) - charm config
The aim here was to provide charm configuration from the cache, where this had the logic of dealing with the branch configuration (nee generations).
The problem here was related to the unit agent calling set charm URL, and then requesting the charm configuration. The problem is that the charm configuration is dependent on the charm version, and if the cache isn’t up to date with the database change, then you may get config for an old or missing charm.
Next trial target - status
juju status
is one of the areas where the user expects slight latency. The operator knows that the agents are busy bringing the world into a state that matches the planned model.
If we have everything we need for status available in the cache, we should be able to respond to status calls even on the biggest models in less than 50ms. If we can make it seem immediate for the operator, this is a big deal.
Rules for cache usage
- Watchers can be added to the cache and used by agents or clients.
- Use the database as a source of data in the general case
There will be exceptions to rule 2, but they are exceptions at this stage. At least if we use watchers from the cache and data from the database we will always be good enough.