Performance Enhancements in Juju 2.9.31

JUJU 2.9.31

Juju version 2.9.31 was released on 31-05-2022. In addition to the fixes you’ll find on the milestone page, we made a couple of small, but incisive performance enhancements. Examples of their impact follow.

The graphs below are from one of Canonical’s internal beta controllers. At the time of writing, it runs approximately:

  • 54 models
  • 488 applications
  • ~1790 units
  • 367 machines.

Memory Consumption

One of the issues we fixed was a Goroutine leak that occurred upon unit agent restarts. This was an artefact of unifying machine and unit agents into a single process as part of the 2.9 series. Over time this caused memory consumption and file descriptor usage to grow.

Here we can see the drop in controller memory usage after upgrade, which no longer exhibits the gradual increases we previously saw over time. The steps-ups you can see correspond with new application deployments on this controller.

Object Persistence and Retrieval Enhancements

Another feature first appearing in the 2.9 series was support for Charmhub. With it came changes to how we encode/decode unit documents upon persistence/retrieval from state, particularly around charm URLs.

Unit document access is a fairly hot path in Juju. By being more selective in when we choose to encode/decode aspects of unit documents, we’ve unlocked some significant improvements in performance.

Slow API Requests

These graphs show our slow API request samples. The different colours are various percentiles, but the noteworthy aspect is the scale.

Before upgrade, with spikes up toward 30 seconds and plenty over 10.

After upgrade. Not one of our slowest requests is over 6.5 seconds, and most of the action is under 2 seconds. Note that these are our slowest requests, not all of them. There’s far less variation too.

Overall API Request Time

Here we can see our request times sampled over five minutes. The tallest line represents all requests that are not calls to watcher Next methods.

Before:

After:

Leadership Operations

In addition to the direct effects, we can see that the load taken off state and thereby the API generally, means leadership API requests show a remarkable improvement, even though they do not deal directly with unit documents.

Before:

After:

We’re very excited about how this is developing. If you’ve been indifferent about staying on the latest release, or deliberately holding back, now is the time to upgrade your Juju controllers.

6 Likes