When I first got started with charms, I noticed that reactive charms, and the PostgreSQL charm in particular, seemed to spend a lot of time doing nothing while it logged information about the flags that the framework was setting. I wrote a post about some of my findings:
After writing a new charming framework called Lucky, though, I noticed a similar issue. Charms seemed to spend a whole lot of time doing things that should have taken no time at all. There may still be other factors to consider but I believe that Iāve found the single largest cause of the performance hit: logging.
Iāve had this suspicion for about a month and today I finally did a quick test with a very basic charm. The script looks like this:
# First step: echo 1000 times to a file
to_file_time=$(/usr/bin/time -po /dev/stdout bash -c 'for i in $(seq 1000); do echo "Item $i" >> /tmp/charm-logging1.log; done')
# Next step: echo 1000 times to juju-log
juju_log_time=$(/usr/bin/time -po /dev/stdout bash -c 'for i in $(seq 1000); do juju-log "Item $i"; done')
juju-log "============================================================================================"
juju-log "To file time ( 1000 records ): $to_file_time"
juju-log "Juju log time ( 1000 records ): $juju_log_time"
All I do is log 1000 lines to a file on the host, and compare that to logging 1000 lines to juju-log
. The result:
To file time ( 1000 records ): real 0.08 user 0.03 sys 0.04
Juju log time ( 1000 records ): real 149.08 user 111.32 sys 71.33
In other words:
Total time ( 1000 records ) | Average time per log entry | |
---|---|---|
juju-log |
149 seconds | 0.15 seconds |
Local file | 0.08 seconds | 0.00008 seconds |
When it comes to just logging, juju-log
is 1,863 times slower than just writing to a file!
Obviously some amount of being slower is understandable, but that is extremely slow. That was the cause of PostgreSQL just sitting there spitting stuff into the log when it should have been doing something. It wasnāt the fault of the reactive framework, it was the fault of juju-log
.
In my eyes, juju-log
should not block the execution of the charm like it does today. Iām not sure exactly how you might design it differently, but one idea is to have a local service to handle logging. juju-log
would push messages to this local service, something that would finish very quickly, and then exit immediately to avoid blocking the charm any longer than necessary. The local service would be responsible for queuing and sending the log messages to the Juju Server.
I would really like to get some feedback from Juju developers on this because it is having a large, noticeable impact on real-life charms in the wild. I had a noticeable increase in the speed of the Letās Encrypt Proxy charm just by removing the verbose logging that I had configured to help with debugging.
With the reactive framework enabling trace logging by default, this will effect almost every major charm in the charm store in a way that directly effects user experience. One of the first things that me and my partner noticed about Juju when we started using it was that it took a long time to spin up a Kubernetes cluster. Yes, it did it and it was amazing, but it took about an hour or more. If we sped up logging it could really help these kinds of situations. I mean, it even seems to have an impact on a simple CodiMD+PostgreSQL charm deployment.