These are my experiences from now running a production grade installation for about a year 2019 - 2020. I expect alot of good things for this year.
== Some general experiences from 2019-2020 ==
PROS (our experiences)
- Juju has great stability.
- Upgrades of controller infrastructure has been super.
- Good support from Canonical.
- Fantastic community support.
- Fast moving development.
- Brings in “Infrastructure as Code” in a perfect way.
- Adds tremendous efficiency in deploying complex stacks. For example SLURM that is our largest use-case at the moment.
CONS (our experiences)
- Error messages that comes from juju are confusing for end-users and provides seldom any assistance to how to deal with them. Its a problem, since it affects the impression of juju in general. Presently, only introduce juju to hard-core devops that can manage this situation.
- Bad support for centos (or any other distro than ubuntu), which we need.
- Problematic to use vsphere in a multi-user setup, since tenant isolation is not fully understood by us. Perhaps its possible with later versions…
- Very fragmented documentation.
- The charm store and GUI is not working well yet - I know there are large changes coming in.
- Writing charms is difficult, very little “best practices” exists e.g. you need experienced developers to develop complex charms.
- There is no quality assurance process for charms which makes it difficult to know if its you, or the charm, that is the problem when your “juju deploy foobar” fails.
== Our core setup of Juju infrastructure ==
- Model: controller
This is the “core” infra controller service, with no integrations. We keep this controller sacred, backed up, secured and with very restricted access.
3x HA juju controller.
1x prometheus2 for monitoring.
1x ntp subordinate for all units.
This core controller provides the foundation to the rest of the juju infrastructure which lives in this controller as models:
- Model: candid
Provides authentication for the other controllers.
3x HA with ActiveDirectory backend
2x haproxy letsencrypt
1x prometheus2.
1x ntp subordinate for all units.
- Model: jimm
Provides the juju client endpoint to jimm.foobar.com. Very much like jaas, but private to us.
Clients do “juju login jimm.foobar.com”, use their ActiveDirectory password and be off to the races.
3x jimm units
2x haproxy letsencrypt
1x prometheus2 for monitoring.
1x ntp subordinate for all units.
- Cloud controllers
We run three models with separate juju controllers for cloud substrates: MAAS, vsphere and AWS.
They run all as
3x HA juju controllers
2x ha-proxy with letsencrypt.
1x prometheus2 for monitoring.
1x ntp subordinate for all units.
=== Availability ===
So far, we have a 100% uptime on the service at large.
=== Performance ===
So far, we have not experienced any performance problems. We scaled our controllers fairly high, which is probably a good idea. The mongodb consumes alot of RAM (16?) so I expect performance to be something we need to work on. Lately, juju add-model has started to take some time (20sec +) on the maas controller. But I think this is just normal every day work that has to be done.
=== Complexity ===
Running juju is non trivial. A “operator handbook” would be something well needed since working up experience in all these new technologies is difficult. If you intend to bring up a enterprise grade installation of juju, make sure to bring in some help.
=== Value ===
We are starting to derive value of juju from a few very hard to get properties in our infrastructure:
- Increased efficiency for each juju-capable engineer (devops) by magnitudes. Yes, this can not be underestimated and makes the effort worth while.
- We are able to equip our “operational staff” with “actions” which produce deterministic outcomes from the execution of operational tasks. That lowers the error rate and increases the precision and can also be added to automation workflows later on.
- We have leveraged the improved collaboration between teams with related, but separate challenges. Juju gives us a way to speak the same language around different topics, without having to change much of the already invested time in existing tools. Its not destructive to already created values so to say.
If you need any advice, just feel free to reach out.