Let's Encrypt certificates in the Juju ecosystem

Let’s Encrypt certificates in the Juju ecosystem

Let’s Encrypt is a well-known free and automated Certificate Authority, making it simpler to secure workloads with TLS encryption. However, until recently, there was no easy way to use it with Juju charms.

In 2022, the Telco charming team introduced the tls-certificates interface for charms, allowing them to request TLS certificates through that relation. It first was supported by the provider tls-certificates-operator, which can supply workloads with either self-signed certificates or certificates that have been manually provided by the operator.

Today, we are announcing the release of 2 charms to enable workloads to get certificates from Let's Encrypt or any other ACME protocol compliant providers: route53-acme-operator and namecheap-acme-operator, as well as an easy way to develop new charms supporting other DNS providers. They leverage the tls-certificates interface and can be used by simply relating charms requiring a certificate to those charms. This has the advantage of providing certificates that will already be trusted by all major browsers, since Let's Encrypt is a trusted Certificate Authority.

In this blog post, we will start by a quick tutorial on how this can be used, and we will explain the ACME challenges and how to write your own operator for other DNS providers.

Tutorial

We will go through the process of deploying a web application and securing it with Let’s Encrypt certificates. For this tutorial, we will use alertmanager-k8s as the web application behind a traefik Kubernetes ingress. We will then use route53-acme-operator to provide a TLS certificate to Traefik.

To follow along, you will require a EKS Kubernetes cluster managed by Juju and a domain managed by Route53.

Step 1 - Deploy Traefik and Alertmanager

The first step is to create a model on Kubernetes and deploy Traefik with a valid external hostname under the managed domain. As an example, we will use aws.gbourgeois.com as the domain.

juju add-model tls-tutorial

juju deploy traefik-k8s --trust --channel edge --config external_hostname=traefik.<DOMAIN>

We then deploy a workload behind Traefik for testing. We use alertmanager-k8s for this task, and relate it to traefik-k8s.

juju deploy alertmanager-k8s --channel edge --trust
juju relate alertmanager-k8s:ingress traefik-k8s:ingress

We can now wait for all units to be in the active/idle state.

juju status --watch 5s

Step 2 - Configure hostname in Route53

We now need to configure Route53 to point the external hostname we gave Traefik to the load-balancer address it got. We first retrieve the load-balancer address:

kubectl get svc traefik-k8s -n tls-tutorial -o=jsonpath='{.status.loadBalancer.ingress[0].hostname}{"\n"}'

Then, we go to the Route53 console, to the hosted zone for our domain, and click Create record. We enter traefik for the subdomain, select CNAME from the Record type drop-down list and paste the address of the load-balancer in the value field. We then click Create records.

We should now be able to navigate to https://traefik.<DOMAIN>/tls-tutorial-alertmanager-k8s/. At this point, the browser will complain that the connection is not secure.

Step 3 - Deploy route53-acme-operator

Next, we write a configuration file for route53-acme-operator. Add this content to a file named route53.yaml, replacing all placeholders for your specific configuration:

route53-acme-operator:
    email: <YOUR_EMAIL>
    server: https://acme-v02.api.letsencrypt.org/directory  # For testing, change this to Let's Encrypt staging URL
    aws_access_key_id: <AWS Access Key ID>
    aws_secret_access_key: <AWS Secret Access Key>
    aws_region: <AWS Region>
    aws_hosted_zone_id: <AWS Route53 Hosted Zone ID>

We are now ready to deploy route53-acme-operator and relate it to traefik-k8s:

juju deploy route53-acme-operator --config route53.yaml
juju relate traefik-k8s:certificates route53-acme-operator:certificates

Again, after a few minutes, juju status will show the route53-acme-operator/0 unit status as active/idle. This will take a bit longer as the operator will request a certificate for traefik.

Now, in our browser we navigate to https://traefik.<DOMAIN>/tls-tutorial-alertmanager-k8s/. As expected, the certificate is trusted and verified by Let's Encrypt. This certificate is valid only for 3 months, but it will be automatically renewed before expiration.

ACME and challenges

The ACME protocol uses “challenges” to ensure that the domain name for which a certificate is requested is really controlled by the requester. There are two main challenges used regularly, the HTTP-01 and the DNS-01 challenges. The acme-client-base, an abstract base charm that is extended by the other ACME operators, support the latter.

The DNS-01 Challenge

The DNS-01 challenge works by ensuring that the requester controls the nameserver for the requested domain. Concretely, it requires the requester to place a unique value in a specific TXT record on the nameserver, where the ACME server will be able to retrieve it, validating domain control.

Many DNS providers provide an API to automate the management of DNS entries. That is where the concrete charms enter, each adding support for a specific DNS provider.

Supporting the DNS-01 challenge has two clear advantages. The first advantage is that it is possible to automate it with a charmed operator. The second advantage is that this challenge enables requesting wildcard certificates, since the control of the whole domain is proven.

The HTTP-01 Challenge

The HTTP-01 challenge requires the server at the requested domain to open port 80 and place a unique file at the path .well-known/acme-challenge/. This challenge is simple to implement, but would need the requirer charm to implement it, since it controls the workload at the requested domain. A Juju operator would not be able to solve this problem for many different charms.

Writing our own ACME charm

A primary design goal we had was to make it easy to support many different DNS providers, without having a single enormous charm that would be difficult to maintain. For that reason, we decided on creating a library that provides a base charm that can be inherited from. This is unusual for charm libraries, but in this case makes it simple to create a new DNS provider charm.

A DNS provider charm based on AcmeClient needs to implement two things:

  1. Handle the ConfigChanged event, validating its specific configuration items, and passing the global configuration items to the base charm.
  2. Implement the _plugin_config property method to return a dictionary of its specific configuration items.

Underneath the hood, the AcmeClient base charm currently uses the LEGO client. It supports a wide list of DNS providers, that can be found on its website.

Taking Digital Ocean as an example, we can see that it has only a single required parameter: DO_AUTH_TOKEN. The bare minimum charm to implement it would look like this:

from typing import Dict

from charms.acme_client_operator.v0.acme_client import AcmeClient
from ops.main import main
from ops.model import ActiveStatus, BlockedStatus


class DigitalOceanAcmeOperatorCharm(AcmeClient):  # Note that we inherit from AcmeClient
    def __init__(self, *args) -> None:
        super().__init__(*args, plugin="digitalocean")
        self.framework.observe(self.on.config_changed, self._on_config_changed)

    def _on_config_changed(self, _) -> None:
        if not self.model.config.get("do_auth_token"):
            self.unit.status = BlockedStatus("do_auth_token is required.")
            return
        try:
            self.validate_generic_acme_config()  # This lets the base charm validate generic parameters (email and config)
        except ValueError as e:
            self.unit.status = BlockedStatus(str(e))
            return
        self.unit.status = ActiveStatus()

    @property
    def _plugin_config(self) -> Dict[str, str]:
        return {"DO_AUTH_TOKEN": self.model.config.get("do_auth_token")}


if __name__ == "__main__":
    main(DigitalOceanAcmeOperatorCharm)

The config.yaml file would look like this:

options:
  email:
    type: string
    description: Account email address
  server:
    type: string
    description: Certificate authority server
    default: "https://acme-v02.api.letsencrypt.org/directory"
  do_auth_token:
    type: string
    description: Digital Ocean Auth Token

The metadata.yaml file would look like this:

name: digitalocean-acme-operator

display-name: Digital Ocean ACME Operator

description: |
  ACME operator implementing the provider side of the `tls-certificates`
  interface to get signed certificates from the `Let's Encrypt` ACME server using Digital Ocean DNS.
summary: |
  ACME operator implementing the provider side of the `tls-certificates`
  interface to get signed certificates from the `Let's Encrypt` ACME server using Digital Ocean DNS.

provides:
  certificates:
    interface: tls-certificates

containers:
  lego:
    resource: lego-image

resources:
  lego-image:
    type: oci-image
    description: Distroless OCI image for lego built with rockcraft.
    upstream-source: ghcr.io/canonical/lego:4.9.1

Now it is your turn to write an ACME charm for your DNS provider!

6 Likes

Nice work @ghibourg! Thanks! :rocket:

2 Likes

This will be fun to try, thank you! I think it will make it much more straightforward for people to convert play into product, using charms to spin things up that are interesting and then moving them into production with your cert management tooling.

I’m rather allergic to wildcard certs, or perhaps just old fashioned :slight_smile: I’m interested in whether an interface could be created on the workload charm to handle HTTP-01 challenges? As you say, it would be the workload that would need to respond, but if the workload itself was integrated with the cert management tooling, then it could do so.

There’s a small typo on one of your interface names near the top of the post, “certitificates”.

Thank you for the kind words and highlighting the typo, I fixed it.

Regarding wildcard certificates, they are of course completely optional and I expect most workloads will not use them. It is however required for some workloads, like Magma however. This could probably be changed upstream however.

We could create a library to support the HTTP-01 challenge directly with the workload, but it would definitely not be as easy to use as this interface, as it would require more development on each charm using it. I think it is worth it to look more into this, as it would cover some cases where the DNS-01 challenge is less appropriate.

I think this could almost certainly be done; and as Ghislain mentioned we could probably write a library for this.

My concern is that the burden for implementation on each workload that wanted to make use of the interface would be quite high, leading to a situation where our “collective code” for performing this function becomes a bit disparate and disjoint.

One of the properties that I like about our current approach, is that anyone making use of these charms gets a consistent and predictable experience - even if it comes with the downside of needing to put more things in the model.

Wouldn’t these workloads commonly be behind something like Nginx or Istio? Could that handle the HTTP-01 challenge response?

I think targeting reverse proxies like Nginx, Istio and Traefik makes more sense for that challenge. IIRC, Traefik has Let’s Encrypt support built-in, but I am not sure the operator exposes the feature today.

OK. Avoiding wildcard certs as a requirement would be a win, happy to be educated about more recent practices if I’m out of date.

They are definitely a bit less secure than certificates that list all subjects explicitly. They do provide some benefits around management of certificates in highly dynamic deployments.

In the case of Magma, the NMS UI currently uses a wildcard certificate. The reason is that each time you create a new organization inside Magma, it creates a new subdomain for it (org1.mymagma.com, org2.mymagma.org, etc.). In this case, we would need Magma itself to handle the certificate request each time a change of organization happens. It might be possible in the future to do it at the operator level, if Juju gains the feature of workload initiated events.

It would be possible to fake it with a workload that monitors Magma for changes and then triggers actions in Juju, but that would be a poor design.

“Dynamic” is just how hackers like it :slight_smile: I would think it’s reasonable, if an application by design creates and removes domains, that it should also be expected to get certificates for those!

1 Like