Juju/MongoDB memory issue

alex · 17 January 2023 00:00

Hello, we have deployed single machine Juju controller (Ubuntu 20.04, Juju 2.9.37) with one model (other then controller and default) with 21 machines with 11 apps for about ~200 days. Actually, the Juju controller process take ~4.2GB and MongoDB process 1.9GB of memory in total. So our virtual machine dedicated to Juju controller with 8GB RAM is on the edge and for example backup action of the controller could not be completed due to lack of memory. Restart of uju controller machine did not help. If I’m right even adding additional nodes and switch to Juju HA controller configuration would not help with this issue.

I would like to ask if there is any option to “clean up” or take some maintenance, to execute some kind of “garbage collection” and lower down the size of the MongoDB and/or Juju itself. We can add additional memory to the controller but I think that this is not the solution just workaround.

Just to compare the sizes we have also another singe node Juju controller on the same OS and in the same version with comparable size of the model but the age of the deployment is only ~40 days. The size of the Juju process on the controller is about ~420MB and MongoDB process size about 280MB of RAM.

Thank you

wallyworld · 18 January 2023 03:20

Is the memory in use growing over time? Those figures seem “reasonable”, but if you’ve been monitoring the usage and it’s increasing even with the deployed model(s) in steady state, then that’s something to investigate further. If there is memory growth, what we’d normally do is ssh to the controller (juju switch controller; juju ssh 0) and then capture a heap profile (juju_heap_profile) and goroutine dump (juju_goroutines) and do it again a few days or a week later (depending on rate of growth). Then the 2 sets can be compared.

alex · 18 January 2023 15:33

Thank you for your reply. I tried to execute " juju_heap_profile" on the controller machine. But I could not find such binary/script not even on other machines within Juju model other that Juju controller. I also checked the docs here (Juju | Agent introspection) but there is no mention from where to get listed tools. I can see following tools directly on the Juju controller:

juju-db.mongo juju-db.mongod juju-db.mongodump juju-db.mongorestore juju-db.mongostat juju-db.mongotop juju-dumplogs juju-introspect juju-run

I tried more of them, for example juju-db.mongostat and juju-db.mongostat but from there I could not find out any relevant data - just some stats. Then I tried “juju-introspect /debug/pprof/heap?debug=1 > out.test1” and after some time I tried to create second dump “juju-introspect /debug/pprof/heap?debug=1 > out.test2”. I installed graphviz (apt install graphviz) and then I was able to run “go tool pprof -http 192.168.100.54:8100 -base out.test1 /var/lib/juju/tools/machine-0/jujud out.test2” (client is 2.9.38 but controller is still 2.9.37). Now I can see some differences but I’m stuck what I can do with it (I’m sorry I’m not a developer). My aim is to lower consumption of RAM by Juju and/or MongoDB. What can I do more? Can I for example delete content of collections “actions” or “operations” in juju database in MongoDB to lower overall size of the DB? Thank you.

wallyworld · 19 January 2023 01:22

The juju_heap_profile etc is a bash function sourced from /etc/profile.d/juju-introspection.sh when you juju ssh into a machine. There’s a whole bunch of functions in here starting with juju_. If for whatever reason the script is not sourced automatically, you can source /etc/profile.d/juju-introspection.sh and that should do the trick. Sounds like you’ve done a good job of discovering the underlying mechanism used by these functions.

Juju will automatically prune completed actions / operations and also unit status history to try and keep db size in check. This is done per model, but you can set up model defaults so that newly created models use those values.

juju model-config
...
max-action-results-age             default  336h
max-action-results-size            default  5G
max-status-history-age             default  336h
max-status-history-size            default  5G
...

The above is per model so if there’s lots of busy models, things can add up. The pruning is done based on age and size, whatever is reached first.

To set things up for new models, use the model-defaults command. But for existing models, you’ll need to change things manually.

Pruning is done once a day (from memory) so any changes may not become apparent for a bit.

For RAM usage, Juju uses what Juju uses so there’s not a lot there that you can (easily) do. If there’s a leak we can plug it but as far as I’m aware, the last known leak was fixed in an earlier 2.9 release.

Hope that helps a bit. Try tuning the pruning params and see if that helps. It will only make a real difference though if your models are running lots of actions or making lots of other changes which result in many charm hook invocations.

alex · 2 February 2023 07:08

Thank you for your reply and advice. I set two weeks ago (in the same day of your reply) following on the model:

max-action-results-age=168h
max-action-results-size=2G
max-status-history-age=168h
max-status-history-size=2G

Now, after two weeks memory sizes of jujud and mongod are following:

3753.43M  /var/lib/juju/tools/machine-0/jujud
2234.61M  /snap/juju-db/160/bin/mongod

The result is that jujud is slightly smaller and mongod is slightly bigger. Not so bad, my primary aim was to stop increasing sizes in total so from this point of view it seems to be done.

If there is anything else what I can do please let me know.

Thank you again