Well, my idea was taking the three separate LXD hosts you have (as three different clouds now), and then form a LXD cluster and enable controller HA. HA worked, but everything else didn’t, so it was not a good idea after all. This is what happened:
+-------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE | MESSAGE |
+-------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| lxd01 | https://lxd01.lxd:9443 | database-leader | x86_64 | default | | ONLINE | Fully operational |
| | | database | | | | | |
+-------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| lxd02 | https://lxd02.lxd:9443 | database | x86_64 | default | | ONLINE | Fully operational |
+-------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| lxd03 | https://lxd03.lxd:9443 | database | x86_64 | default | | ONLINE | Fully operational |
+-------+------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
With the initial controller running on lxd01, it would be possible to enable controller HA like this:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
controller lxd01-controller lxd01/default 2.9.29 unsupported 14:13:10Z
Machine State DNS Inst id Series AZ Message
0 started 10.70.41.18 juju-b135a3-0 focal Running
$ juju enable-ha -n 3 --to lxd02,lxd03
maintaining machines: 0
adding machines: 1, 2
Verify the result:
$ juju controllers --refresh
Controller Model User Access Cloud/Region Models Nodes HA Version
lxd01-controller* controller admin superuser lxd01/default 8 15 3 2.9.29
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
controller lxd01-controller lxd01/default 2.9.29 unsupported 14:33:38Z
Machine State DNS Inst id Series AZ Message
0 started 10.70.41.18 juju-b135a3-0 focal Running
1 started 10.70.41.60 juju-b135a3-1 focal Running
2 started 10.70.41.167 juju-b135a3-2 focal Running
The controllers would end up on the correct hosts, see the juju-b135a3-N
containers:
ubuntu@lxd03:~$ lxc list
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-5f4089-0 | RUNNING | 10.70.41.217 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fecc:fbaf (eth0) | CONTAINER | 0 | lxd01 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-6a6602-0 | RUNNING | 10.70.41.59 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fe73:3c8b (eth0) | CONTAINER | 0 | lxd01 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-6a6602-1 | STOPPED | | | CONTAINER | 0 | lxd02 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-6ecb41-0 | RUNNING | 10.70.41.82 (eth0) | fd42:eb1c:1006:9cab:7c7f:a2ff:fe20:b7f2 (eth0) | CONTAINER | 0 | lxd02 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-188846-0 | RUNNING | 10.70.41.109 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fedb:d963 (eth0) | CONTAINER | 0 | lxd03 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-b135a3-0 | RUNNING | 10.70.41.18 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fe48:ec7 (eth0) | CONTAINER | 0 | lxd01 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-b135a3-1 | RUNNING | 10.70.41.60 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fefa:df37 (eth0) | CONTAINER | 0 | lxd02 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
| juju-b135a3-2 | RUNNING | 10.70.41.167 (eth0) | fd42:eb1c:1006:9cab:216:3eff:fe19:6709 (eth0) | CONTAINER | 0 | lxd03 |
+---------------+---------+---------------------+------------------------------------------------+-----------+-----------+----------+
This was my idea, but when testing it, I figured out it was bad for two reasons, and I wouldn’t do this on a production system:
- When joining a LXD cluster, the new member “forgets” its running containers, so they are not added to the cluster and not visible anywhere. They continue to run and are possible to locate on disk, but you may have bad luck and maybe destroy your storage pool. I don’t know if manually “importing” the containers back is possible. All units will show cannot upgrade machine’s lxd profile: 0: Instance not found
- For some reason, models already in use (except the controller model) stops working, and I haven’t tried so hard to fix that. When adding units, the error will be something like retrieving environ: creating environ for model “lxd03mod” (64e1ec58-b1bc-446a-8856-47d065cfbeb8): Get “https://10.70.41.228:8443/1.0”: x509: certificate is valid for 127.0.0.1, ::1, not 10.70.41.228
So now my old models are broken, the containers are lost in LXD, but hey, my controllers are in HA