Hey guys, we’re hitting some python-libjuju/juju issues when testing our automated openstack charms deployment and upgrade.
Python-libjuju is receiving deltas from controller to keep up with the actual state of the model. The first one is received during connection. Since delta is sent the same way as all the commands - over rpc message, it falls under the limit of rpc message.
The problem appears with higher usage of juju actions, especially when the output of these actions is considerably longer and the amount of units that are targeted is huge (e.g. openstack deploy). Output of actions is stored in this delta, for all these units and sent within every delta. Once delta overgrows the limit libjuju fails to connect or if connected continues with inconsistent state because attempts to receive delta are failing.
In our case we are using 64MB limit (as compared to default 4MB) and we are still able to hit it within a week, if we start with a clean model. We suggest that at least the output of actions is lazy loaded on demand, to preserve the reliability of the client. This may require changes both in python-libjuju and juju itself.
One of the errors we observed when hitting the issue during cleanup of the model was already reported, but not yet followed: https://github.com/juju/python-libjuju/issues/179
We have noticed this idea to use lazy-loading, and having more fine grained watchers could maybe help with this: https://github.com/juju/python-libjuju/issues/181