I’m having issues with a juju controller suddenly getting problems connecting to 2 separate lxd hosts.
It complains alot about certificates (/var/log/juju/logsink.log):
11ca14fd-f66a-43aa-8644-06f7f70de9a0: machine-0 2022-03-16 22:27:18 ERROR juju.provider.lxd environ_instance.go:35 failed to get instances from LXD: Get "https://192.168.111.4:8443/1.0/containers?project=default&recursion=1": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4
11ca14fd-f66a-43aa-8644-06f7f70de9a0: machine-0 2022-03-16 22:27:18 ERROR juju.worker.dependency engine.go:693 "instance-poller" manifold worker returned unexpected error: Get "https://192.168.111.4:8443/1.0/containers?project=default&recursion=1": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4
11ca14fd-f66a-43aa-8644-06f7f70de9a0: machine-0 2022-03-16 22:27:20 ERROR juju.provider.lxd environ_instance.go:35 failed to get instances from LXD: Get "https://192.168.111.4:8443/1.0/containers?project=default&recursion=1": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4
11ca14fd-f66a-43aa-8644-06f7f70de9a0: machine-0 2022-03-16 22:27:20 ERROR juju.worker.dependency engine.go:693 "instance-poller" manifold worker returned unexpected error: Get "https://192.168.111.4:8443/1.0/containers?project=default&recursion=1": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4
409902da-6d5f-4649-83ea-61bc27b7ad5b: machine-0 2022-03-16 22:27:24 ERROR juju.worker.dependency engine.go:693 "broker-tracker" manifold worker returned unexpected error: cannot load machine machine-0 from state: not authorized
The machine-0 above is the controller machine.
The 192.168.111.4 is the first lxd host and the second is 192.168.111.2. They are 2 separate clouds on the same controller.
I can add models, but when I add machines, this error shows up in juju status:
failed to start machine 0 (Get "https://192.168.111.4:8443/1.0/images/aliases/juju%2Ffocal%2Famd64?project=default": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4), retrying in 10s (10 more attempts)
I fear that the lxd snap in the lxd hosts has replaced its server certificate, but I really don’t know.
Can this be also related to credential issues from “juju update-credentials” at some stage?
Really need some help here.
This is some more details:
$ juju update-credential dwellir5
This operation can be applied to both a copy on this client and to the one on a controller.
Do you want to update credential "" on cloud "dwellir5" on:
1. client only (--client)
2. controller "dwellir-sodertalje" only (--controller dwellir-sodertalje)
3. both (--client --controller dwellir-sodertalje)
Enter your choice, or type Q|q to quit: 2
Credential valid for:
foo1
Credential invalid for:
rpc-5:
Get "https://192.168.111.4:8443/1.0": x509: certificate is valid for 127.0.0.1, ::1, not 192.168.111.4
Failed models may require a different credential.
Use ‘juju set-credential’ to change credential for these models before repeating this update.
… so I did some more “juju update-credentials” and got rid of the above message (for a while?).
I managed to do “juju add-model foo3 dwellir3/sodertalje” and even managed to deploy tiny-bash, but now I’m back at this after trying to “juju add-machine”.
So, I then re-run “juju update-credential” again, at which point the model comes alive.
… and then the credential again becomes invalid. (Suspended since cloud credential is not valid)
Its just like the controller “forgets” about my credential?