Local dns server for juju units (proof of concept)

In development environments and during integration tests it could be handy to be able to curl workloads deployed by juju, via their topology. For example:

curl leader.prometheus.cos-lite.juju.internal:9090/api/v1/targets | jq '...'

One way of achieving this:

  1. Periodically update a hosts-like file with all the addresses from juju status.
  2. Local dns server that reads the custom hosts file.
  3. Plug the local dns server into systemd-resolved.

A hosts-like file with all the addresses from juju status

This can be accomplished by:

  1. Get all combinations of controller-name and model-name from ~/.local/share/juju/models.yaml.
  2. Run juju status for each pair, extracting the app address and unit address.
  3. Render it in a /etc/hosts-like format. Also add a leader. subdomain for leader units.

Code for this can be found here: https://gist.github.com/sed-i/63f92c5f3e55d6688db79276e852e736.

Sample output:

$ ./juju-network.py | sudo tee /etc/hosts.juju
# --- THIS STDERR ---
{('microk8s', 'admin/controller'), ('lxd', 'admin/welcome-lxd'), ('lxd', 'admin/controller'), ('j34', 'admin/controller'), ('microk8s', 'admin/pebnote')}
Obtaining status for microk8s:admin/controller
No addresses for app 'controller'
Obtaining status for lxd:admin/welcome-lxd
Failed
Obtaining status for lxd:admin/controller
Failed
Obtaining status for j34:admin/controller
No addresses for app 'controller'
Obtaining status for microk8s:admin/pebnote
# --- THIS IS STDOUT ---
10.1.166.91 unit-0.loki2.pebnote.juju.internal
10.1.166.91 leader.loki2.pebnote.juju.internal
10.1.166.90 leader.prom.pebnote.juju.internal
10.1.166.93 leader.trfk.pebnote.juju.internal
10.1.166.90 unit-0.prom.pebnote.juju.internal
10.1.166.84 unit-0.loki.pebnote.juju.internal
10.152.183.140 prom.pebnote.juju.internal
10.43.8.188 trfk.pebnote.juju.internal
10.1.166.84 leader.loki.pebnote.juju.internal
10.152.183.241 loki.pebnote.juju.internal
10.1.166.93 unit-0.trfk.pebnote.juju.internal
10.152.183.83 loki2.pebnote.juju.internal

Now we have a hosts-like file, /etc/hosts.juju, that we can use with a lightweight dns server.

Local dns server that reads the custom hosts file

dnsmasq is lightweight and good enough for a first try.

sudo apt install dnsmasq

Then update the config file:

$ cat /etc/dnsmasq.conf | grep '^[^# ]'
port=5353
no-resolv
listen-address=::1,127.0.0.1
no-hosts
addn-hosts=/etc/hosts.juju

Note that changes made to the hosts.juju file won’t be picked up by dnsmasq automatically, and I haven’t found a config option to do so. A documented method to re-read the file is to sudo killall -SIGHUP dnsmasq.

Plug the local dns server into systemd-resolved

Add a .network file like this:

$ cat /etc/systemd/network/juju.network 
[Match]
Name=*

[Network]
DNS=127.0.0.1:5353
Domains=~juju.internal

and reload services:

sudo systemctl daemon-reload
sudo systemctl restart systemd-networkd
sudo systemctl restart systemd-resolved

Shortcomings

  1. Need to manually (or scheduled periodically) update the juju hosts file AND force dnsmasq to reload (sudo killall -SIGHUP dnsmasq). In integration tests this can be done once active/idle after deploy/upgrade.
  2. Ever since I systemctl restart, the juju status command hangs for lxd model. Haven’t figured out yet why.

A simpler, one-liner alternative

We can use jq to render the hosts file for the current model only:

$ juju status --format json 
  | jq -r '.model.name as $model | .applications[].units | to_entries | map("\(.value.address) \(.key | gsub("/"; "-")).\($model).juju.internal") | join("\n")'
10.1.166.84 loki-0.pebnote.juju.internal
10.1.166.91 loki2-0.pebnote.juju.internal
10.1.166.90 prom-0.pebnote.juju.internal
10.1.166.93 trfk-0.pebnote.juju.internal

To generalize to all models,

#!/usr/bin/env bash


for ctrl_mdl in $(cat ~/.local/share/juju/models.yaml | yq -o json | jq -r '.controllers | to_entries[] | .key as $controller | .value.models | to_entries[] | "\($controller):\(.key)"')
do
  timeout 5s bash <<EOT
juju status -m "$ctrl_mdl" --format json \
  | jq -r '.model.name as \$model | .applications[].units | to_entries | map("\(.value.address) \(.key | gsub("/"; "-")).\(\$model).juju.internal") | join("\n")'
EOT
done

Looking forward

  1. Perhaps use some lib to have a super simple all-in-one juju dns server app that runs in the background, so we won’t need dnsmasq and won’t need to schedule periodic tasks.
  2. Have this functionality integrated in juju itself.

References

4 Likes

this is so nice it has to make it in jhack

3 Likes

Super nice work Leon! How about we go even further and have the Juju controller resolve .juju.internal or similar?

2 Likes

Leon, Thx for a very interesting concept!

It feels like parsing of juju status output can be enough to make this idea work with 3-party (against juju) apps for testing, like jhack (@ppasotti mentioned).

If we decided to implement such kind of functionality (use DNS names instead of IPs in charms communication) in Juju, then here, I agree with @sabdfl , we need more profound integration with controller because we risk breaking some Juju communication rules.

Just from the top of top of my head, there are several decision points:

  • How to implement a DNS server? Is it a regular charm or special charm that represents some sort of networking service to Juju, by translation it’s state to controller.
  • Should this DNS charm be model-centric, or better network space-centric not to ruin juju network visibility?
  • How will other charms get updates about changing DNS charm state? That should be a separate event, that changes their host-like files? Or should it be an extension to already existing events for juju charms, like: adding a unit, removing a unit, or changing relation data?
  • Also definitely need to think, will the situation of deploying DNS charms will differ in various providers supported by Juju? Maybe in some case we don’t need a separate DNS charm, and need only to take DNS service provided by pubclic cloud provider, or by K8s controller?
  • Potential pain point of how this kind of DNS service will feel itself in CMR. Coz in that case, we must sync the state among models/controllers.

And again, this concept is worth further discussion, It would be nice if we found to do it (for example, in Madrid)

1 Like

Love this. I do really think that we should move toward Juju being a DNS resolver itself.

So long as Juju keeps the zones up to date when models changing, getting the deployed units to resolve should be trivial:

  • On machines, systemd-resolved can be configured with a .juju search domain or similar (like recommended in the Consul docs).
  • On Kubernetes, CoreDNS can be configured to forward any queries for the .juju domain to the controller pod

This get’s around @anvial’s concerns about how pod’s get updates - the answer is that they don’t, and we just set a low TTL on the DNS responses served up by Juju :slight_smile:

3 Likes

Just saw this and I’m happy to see the discussion. I have at least of two immediate use cases, being:

  • Remove hacks from charm code which are writing /etc/hosts due the workload being sensible to address changes (cof cof mysql cof cof)
  • Properly address unit by name on machine charms on cross model/controllers/clouds relations, which is a struggle right now.

@anvial are you intending to spawn a discussion in Madrid?

1 Like

Yep, we plan to have a session about it! I’ll be sure to invite you.

3 Likes