When a machine changes ip due to dhcp and restarting, there should be a way to quickly inform juju about that or make juju refresh the info it has about the machine
Hey John, could you provide some more information on the specific setup where you encountered this issue. In your setup does Juju eventually rediscover the machine or does it stay in an error state?
Hello @aflynn, it persistently stays in the error state. Actually it’s still in that state since the day I posted the question. This has prompted me trying to go down the rabbit hole seeing how to debug the cause. But the codebase is quite complex though I’m making some progress grokking it but this is unnecessary indirection.
After debugging for a while, I’m able to narrow it down to the fact that the new IP address is not in the list of known hosts. Someone know how to fix this?
Also that is to say that the IP address does show up as part of the list of IP address for the machine as against my previous assumptions
The --no-host-key-checks options help to log in to machine even when the machine is in an error state and I see something like permanently adding to known hosts. I expect that the second time around it should go through but it seems like permanently added was not permanent after all so it continues with the error state and not able to log in without the --no-host-key-checks option
Thanks for doing the extra digging. Which cloud are your controller/machines on? This sounds like it could be an issue with the way Juju interacts with it.
Also what is the specific error? Is the error coming from the charm or Juju itself? There may be more context in the model logs (juju debug-log -m <model>
) and the controller logs (juju debug-log -m controller
).
@aflynn it’s an LXD cloud
johnbendi@generalbendi:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
k8s-001 overlord-lxd localhost/localhost 3.6.0 unsupported 06:27:39+01:00
App Version Status Scale Charm Channel Rev Exposed Message
calico v3.27.4 error 5 calico latest/stable 123 no hook failed: "config-changed"
containerd 1.7.12 active 5 containerd latest/stable 83 no Container runtime available
easyrsa 3.0.1 active 1/2 easyrsa latest/stable 66 no Certificate Authority connected.
etcd 3.4.22 blocked 3/4 etcd latest/stable 774 no UnHealthy with 3 known peers
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer latest/stable 169 yes Ready
kubernetes-control-plane 1.31.4 waiting 2 kubernetes-control-plane latest/stable 558 no Waiting for 1 kube-system pod to start
kubernetes-worker 1.31.4 blocked 3 kubernetes-worker latest/stable 265 yes Not all snaps are available on channel=1.31/stable
Unit Workload Agent Machine Public address Ports Message
easyrsa/0 unknown lost 0 fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7 agent lost, see 'juju show-status-log easyrsa/0'
easyrsa/1* active idle 13 fd42:c24d:2827:4f4c:216:3eff:fe25:3c19 Certificate Authority connected.
etcd/0* active idle 1 10.198.60.51 2379/tcp Healthy with 4 known peers
etcd/1 unknown lost 2 10.198.60.87 2379/tcp agent lost, see 'juju show-status-log etcd/1'
etcd/2 active idle 3 10.198.60.134 2379/tcp Healthy with 4 known peers
etcd/3 active idle 12 10.198.60.102 2379/tcp Healthy with 4 known peers
kubeapi-load-balancer/0* active idle 4 10.198.60.26 443,6443/tcp Ready
kubernetes-control-plane/0 waiting idle 5 10.198.60.118 6443/tcp Waiting for 1 kube-system pod to start
calico/3 error idle 10.198.60.118 hook failed: "config-changed"
containerd/3 active idle 10.198.60.118 Container runtime available
kubernetes-control-plane/1* waiting idle 6 10.198.60.195 6443/tcp Waiting for 1 kube-system pod to start
calico/4 error idle 10.198.60.195 hook failed: "config-changed"
containerd/4 active idle 10.198.60.195 Container runtime available
kubernetes-worker/0 blocked idle 7 10.198.60.47 80,443/tcp Not all snaps are available on channel=1.31/stable
calico/0 waiting idle 10.198.60.47 calico: Deployment/kube-system/calico-kube-controllers is not Available, calico: PodDisruptionBudget/kube-system/cali...
containerd/0 active idle 10.198.60.47 Container runtime available
kubernetes-worker/2 blocked idle 9 10.198.60.157 80,443/tcp Not all snaps are available on channel=1.31/stable
calico/1 waiting idle 10.198.60.157 calico: Deployment/kube-system/calico-kube-controllers is not Available, calico: PodDisruptionBudget/kube-system/cali...
containerd/1 active idle 10.198.60.157 Container runtime available
kubernetes-worker/3* waiting idle 11 10.198.60.88 80,443/tcp Waiting for certificate authority
calico/5* waiting idle 10.198.60.88 calico: Deployment/kube-system/calico-kube-controllers is not Available, calico: PodDisruptionBudget/kube-system/cali...
containerd/5* active idle 10.198.60.88 Container runtime available
Machine State Address Inst id Base AZ Message
0 down fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7 juju-9317c7-0 ubuntu@22.04 Running
1 started 10.198.60.51 juju-9317c7-1 ubuntu@22.04 Running
2 down 10.198.60.87 juju-9317c7-2 ubuntu@22.04 Running
3 started 10.198.60.134 juju-9317c7-3 ubuntu@22.04 Running
4 started 10.198.60.26 juju-9317c7-4 ubuntu@22.04 Running
5 started 10.198.60.118 juju-9317c7-5 ubuntu@22.04 Running
6 started 10.198.60.195 juju-9317c7-6 ubuntu@22.04 Running
7 started 10.198.60.47 juju-9317c7-7 ubuntu@22.04 Running
9 started 10.198.60.157 juju-9317c7-9 ubuntu@22.04 Running
10 started fd42:c24d:2827:4f4c:216:3eff:fecc:7563 juju-9317c7-10 ubuntu@24.04 Running
11 started 10.198.60.88 juju-9317c7-11 ubuntu@22.04 Running
12 started 10.198.60.102 juju-9317c7-12 ubuntu@22.04 Running
13 started fd42:c24d:2827:4f4c:216:3eff:fe25:3c19 juju-9317c7-13 ubuntu@22.04 Running
Notice machine-0 and machine-2 is down and I had to add machine-12 and machine-13 to work around the down machines. Machine 12 wouldn’t join etcd without easyRSA on machine-0, so machine-13 to the rescue.
On machine-0, I have the relevant part of the machine agent log below:
2025-01-11 04:53:56 INFO juju.cmd supercommand.go:56 running jujud [3.6.0 6dec77a01480916689430d38ef3e032cb1e06b78 gc go1.23.3]
2025-01-11 04:53:56 DEBUG juju.cmd supercommand.go:57 args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}
2025-01-11 04:53:56 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 1
2025-01-11 04:53:56 DEBUG juju.agent agent.go:600 read agent config, format "2.0"
2025-01-11 04:53:56 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.6.0 have already been run.
2025-01-11 04:53:56 DEBUG juju.cmd.jujud runner.go:416 start "engine"
2025-01-11 04:53:56 INFO juju.cmd.jujud runner.go:592 start "engine"
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "termination-signal-handler" manifold worker started at 2025-01-11 04:53:56.627307144 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "charmhub-http-client" manifold worker started at 2025-01-11 04:53:56.627630912 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "agent" manifold worker started at 2025-01-11 04:53:56.627658294 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "clock" manifold worker started at 2025-01-11 04:53:56.62793581 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-gate" manifold worker started at 2025-01-11 04:53:56.6293256 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "syslog" manifold worker started at 2025-01-11 04:53:56.632331038 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "state-config-watcher" manifold worker started at 2025-01-11 04:53:56.632482745 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-flag" manifold worker started at 2025-01-11 04:53:56.632676917 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "upgrade-check-gate" manifold worker started at 2025-01-11 04:53:56.633939272 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.introspection worker.go:125 introspection worker listening on "/var/lib/juju/agents/machine-0/introspection.socket"
2025-01-11 04:53:56 DEBUG juju.cmd.jujud runner.go:424 "engine" started
2025-01-11 04:53:56 DEBUG juju.worker.introspection worker.go:150 stats worker now serving
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "api-config-watcher" manifold worker started at 2025-01-11 04:53:56.637954723 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.api apiclient.go:1036 successfully dialed "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
2025-01-11 04:53:56 INFO juju.api apiclient.go:571 connection established to "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "is-controller-flag" manifold worker started at 2025-01-11 04:53:56.642722767 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "is-not-controller-flag" manifold worker started at 2025-01-11 04:53:56.642954896 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:580 "upgrade-check-flag" manifold worker started at 2025-01-11 04:53:56.645049596 +0000 UTC
2025-01-11 04:53:56 DEBUG juju.worker.apicaller connect.go:160 [f7411e] failed to connect
2025-01-11 04:53:56 ERROR juju.worker.apicaller connect.go:209 Failed to connect to controller: invalid entity name or password (unauthorized access)
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:618 "api-caller" manifold worker stopped: [f7411e] "machine-0" cannot open api: connection permanently impossible
stack trace:
github.com/juju/juju/worker/apicaller.init:42: connection permanently impossible
github.com/juju/juju/cmd/jujud/agent/machine.commonManifolds.Manifold.ManifoldConfig.startFunc.func35:97: [f7411e] "machine-0" cannot open api
2025-01-11 04:53:56 INFO juju.worker.stateconfigwatcher manifold.go:120 tomb dying
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "state-config-watcher" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "termination-signal-handler" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-gate" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "is-controller-flag" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-flag" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "upgrade-check-flag" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "charmhub-http-client" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "agent" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "upgrade-check-gate" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "api-config-watcher" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "clock" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "is-not-controller-flag" manifold worker completed successfully
2025-01-11 04:53:56 DEBUG juju.worker.dependency engine.go:603 "syslog" manifold worker completed successfully
2025-01-11 04:53:56 INFO juju.cmd.jujud runner.go:623 stopped "engine", err: agent should be terminated
2025-01-11 04:53:56 DEBUG juju.cmd.jujud runner.go:430 "engine" done: agent should be terminated
2025-01-11 04:53:56 DEBUG juju.cmd.jujud runner.go:485 error "engine": agent should be terminated
2025-01-11 04:53:56 ERROR juju.cmd.jujud runner.go:487 fatal error "engine": agent should be terminated
2025-01-11 04:53:56 INFO cmd supercommand.go:556 command finished
2025-01-11 04:53:56 DEBUG juju.cmd.jujud main.go:286 jujud complete, code 0, err <nil>
This must be somehow due to a machine !P change but can’t really confirm because I didn’t take a record of it then, naively. Another surprising thing is that juju ssh 0
works but juju ssh 2
doesn’t.
johnbendi@generalbendi:~$ juju --debug ssh 0
07:17:47 INFO juju.cmd supercommand.go:56 running juju [3.6.1 cdb5fe45b78a4701a8bc8369c5a50432358afbd3 gc go1.23.3]
07:17:47 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/29241/bin/juju", "--debug", "ssh", "0"}
07:17:47 INFO juju.juju api.go:86 connecting to API addresses: [10.198.60.170:17070 [fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070]
07:17:47 DEBUG juju.api apiclient.go:1035 successfully dialed "wss://[fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
07:17:47 INFO juju.api apiclient.go:570 connection established to "wss://[fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
07:17:47 DEBUG juju.cmd.juju.ssh ssh_machine.go:345 proxy-ssh is false
07:17:47 DEBUG juju.network.ssh reachable.go:156 dialing [fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7]:22 to check host keys
07:17:47 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.53:22 to check host keys
07:17:47 DEBUG juju.network.ssh reachable.go:169 connected to 10.198.60.53:22, initiating ssh handshake
07:17:47 DEBUG juju.network.ssh reachable.go:169 connected to [fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7]:22, initiating ssh handshake
07:17:47 DEBUG juju.network.ssh reachable.go:98 accepted host key for: [fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7]:22
07:17:47 INFO juju.network.ssh reachable.go:223 found [fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7]:22 has an acceptable ssh key
07:17:47 DEBUG juju.cmd.juju.ssh ssh_machine.go:495 using target "0" address "fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7"
07:17:47 DEBUG juju.network.ssh reachable.go:98 accepted host key for: 10.198.60.53:22
07:17:47 DEBUG juju.network.ssh reachable.go:181 ssh: handshake failed: host key was accepted, but search was stopped
07:17:47 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
Authenticated to fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7 ([fd42:c24d:2827:4f4c:216:3eff:fe6e:93d7]:22) using "publickey".
Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 6.8.0-48-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information disabled due to load higher than 1.0
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
New release '24.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sat Jan 11 05:24:45 2025 from 10.198.60.1
For machine 2, juju --debug ssh 2
:
johnbendi@generalbendi:~$ juju --debug ssh 2
06:37:43 INFO juju.cmd supercommand.go:56 running juju [3.6.1 cdb5fe45b78a4701a8bc8369c5a50432358afbd3 gc go1.23.3]
06:37:43 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/29241/bin/juju", "--debug", "ssh", "2"}
06:37:43 INFO juju.juju api.go:86 connecting to API addresses: [[fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070 10.198.60.170:17070]
06:37:43 DEBUG juju.api apiclient.go:1035 successfully dialed "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
06:37:43 INFO juju.api apiclient.go:570 connection established to "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
06:37:43 DEBUG juju.cmd.juju.ssh ssh_machine.go:345 proxy-ssh is false
06:37:43 DEBUG juju.network.ssh reachable.go:156 dialing [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 to check host keys
06:37:43 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.87:22 to check host keys
06:37:43 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.86:22 to check host keys
06:37:43 DEBUG juju.network.ssh reachable.go:169 connected to 10.198.60.86:22, initiating ssh handshake
06:37:43 DEBUG juju.network.ssh reachable.go:169 connected to [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22, initiating ssh handshake
06:37:43 DEBUG juju.network.ssh reachable.go:110 host key for 10.198.60.86:22 not in our accepted set: log at TRACE to see raw keys
06:37:43 DEBUG juju.network.ssh reachable.go:110 host key for [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 not in our accepted set: log at TRACE to see raw keys
06:37:46 DEBUG juju.network.ssh reachable.go:159 dial 10.198.60.87:22 failed with: dial tcp 10.198.60.87:22: connect: no route to host
06:37:46 DEBUG juju.cmd.juju.ssh ssh_machine.go:491 getting target "2" address(es) failed: cannot connect to any address: [10.198.60.86:22 10.198.60.87:22 [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22] (retrying)
06:37:48 DEBUG juju.network.ssh reachable.go:156 dialing [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 to check host keys
06:37:48 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.86:22 to check host keys
06:37:48 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.87:22 to check host keys
06:37:48 DEBUG juju.network.ssh reachable.go:169 connected to [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22, initiating ssh handshake
06:37:48 DEBUG juju.network.ssh reachable.go:169 connected to 10.198.60.86:22, initiating ssh handshake
06:37:48 DEBUG juju.network.ssh reachable.go:110 host key for 10.198.60.86:22 not in our accepted set: log at TRACE to see raw keys
06:37:48 DEBUG juju.network.ssh reachable.go:110 host key for [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 not in our accepted set: log at TRACE to see raw keys
06:37:51 DEBUG juju.network.ssh reachable.go:159 dial 10.198.60.87:22 failed with: dial tcp 10.198.60.87:22: connect: no route to host
06:37:51 DEBUG juju.cmd.juju.ssh ssh_machine.go:491 getting target "2" address(es) failed: cannot connect to any address: [10.198.60.86:22 10.198.60.87:22 [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22] (retrying)
06:37:53 DEBUG juju.network.ssh reachable.go:156 dialing [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 to check host keys
06:37:53 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.87:22 to check host keys
06:37:53 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.86:22 to check host keys
06:37:53 DEBUG juju.network.ssh reachable.go:169 connected to 10.198.60.86:22, initiating ssh handshake
06:37:53 DEBUG juju.network.ssh reachable.go:169 connected to [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22, initiating ssh handshake
06:37:53 DEBUG juju.network.ssh reachable.go:110 host key for [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 not in our accepted set: log at TRACE to see raw keys
06:37:53 DEBUG juju.network.ssh reachable.go:110 host key for 10.198.60.86:22 not in our accepted set: log at TRACE to see raw keys
But juju --debug ssh --no-host-key-checks 2
works:
johnbendi@generalbendi:~$ juju --debug ssh --no-host-key-checks 2
06:41:38 INFO juju.cmd supercommand.go:56 running juju [3.6.1 cdb5fe45b78a4701a8bc8369c5a50432358afbd3 gc go1.23.3]
06:41:38 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/29241/bin/juju", "--debug", "ssh", "--no-host-key-checks", "2"}
06:41:38 INFO juju.juju api.go:86 connecting to API addresses: [10.198.60.170:17070 [fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070]
06:41:38 DEBUG juju.api apiclient.go:1035 successfully dialed "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
06:41:38 INFO juju.api apiclient.go:570 connection established to "wss://10.198.60.170:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
06:41:38 DEBUG juju.cmd.juju.ssh ssh_machine.go:345 proxy-ssh is false
06:41:38 DEBUG juju.network.ssh reachable.go:156 dialing [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 to check host keys
06:41:38 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.87:22 to check host keys
06:41:38 DEBUG juju.network.ssh reachable.go:156 dialing 10.198.60.86:22 to check host keys
06:41:38 DEBUG juju.network.ssh reachable.go:169 connected to [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22, initiating ssh handshake
06:41:38 DEBUG juju.network.ssh reachable.go:169 connected to 10.198.60.86:22, initiating ssh handshake
06:41:38 DEBUG juju.network.ssh reachable.go:98 accepted host key for: [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22
06:41:38 INFO juju.network.ssh reachable.go:223 found [fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22 has an acceptable ssh key
06:41:38 DEBUG juju.cmd.juju.ssh ssh_machine.go:495 using target "2" address "fd42:c24d:2827:4f4c:216:3eff:fe13:6951"
06:41:38 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
06:41:38 DEBUG juju.network.ssh reachable.go:98 accepted host key for: 10.198.60.86:22
06:41:38 DEBUG juju.network.ssh reachable.go:181 ssh: handshake failed: host key was accepted, but search was stopped
Warning: Permanently added 'fd42:c24d:2827:4f4c:216:3eff:fe13:6951' (ED25519) to the list of known hosts.
Authenticated to fd42:c24d:2827:4f4c:216:3eff:fe13:6951 ([fd42:c24d:2827:4f4c:216:3eff:fe13:6951]:22) using "publickey".
Last login: Fri Jan 10 09:19:12 2025 from 10.198.60.1
ubuntu@juju-9317c7-2:~$
The machine-8 agent log looks similar to machine-0:
2025-01-11 05:43:23 INFO juju.cmd supercommand.go:56 running jujud [3.6.0 6dec77a01480916689430d38ef3e032cb1e06b78 gc go1.23.3]
2025-01-11 05:43:23 DEBUG juju.cmd supercommand.go:57 args: []string{"/var/lib/juju/tools/machine-2/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "2", "--debug"}
2025-01-11 05:43:23 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 2
2025-01-11 05:43:23 DEBUG juju.agent agent.go:600 read agent config, format "2.0"
2025-01-11 05:43:23 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.6.0 have already been run.
2025-01-11 05:43:23 DEBUG juju.cmd.jujud runner.go:416 start "engine"
2025-01-11 05:43:23 INFO juju.cmd.jujud runner.go:592 start "engine"
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "syslog" manifold worker started at 2025-01-11 05:43:23.109597003 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "upgrade-check-gate" manifold worker started at 2025-01-11 05:43:23.109872965 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "agent" manifold worker started at 2025-01-11 05:43:23.110321085 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "charmhub-http-client" manifold worker started at 2025-01-11 05:43:23.110764418 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "clock" manifold worker started at 2025-01-11 05:43:23.110947983 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "api-config-watcher" manifold worker started at 2025-01-11 05:43:23.111502164 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-gate" manifold worker started at 2025-01-11 05:43:23.111610909 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "termination-signal-handler" manifold worker started at 2025-01-11 05:43:23.112374698 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.introspection worker.go:125 introspection worker listening on "/var/lib/juju/agents/machine-2/introspection.socket"
2025-01-11 05:43:23 DEBUG juju.cmd.jujud runner.go:424 "engine" started
2025-01-11 05:43:23 DEBUG juju.worker.introspection worker.go:150 stats worker now serving
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "upgrade-check-flag" manifold worker started at 2025-01-11 05:43:23.119100163 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "state-config-watcher" manifold worker started at 2025-01-11 05:43:23.120178505 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-flag" manifold worker started at 2025-01-11 05:43:23.12356202 +0000 UTC
2025-01-11 05:43:23 DEBUG juju.api apiclient.go:1036 successfully dialed "wss://[fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
2025-01-11 05:43:23 INFO juju.api apiclient.go:571 connection established to "wss://[fd42:c24d:2827:4f4c:216:3eff:fe98:9af0]:17070/model/f7411ed0-3b7a-44c6-899d-b8ed029317c7/api"
2025-01-11 05:43:23 DEBUG juju.worker.apicaller connect.go:160 [f7411e] failed to connect
2025-01-11 05:43:23 ERROR juju.worker.apicaller connect.go:209 Failed to connect to controller: invalid entity name or password (unauthorized access)
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:618 "api-caller" manifold worker stopped: [f7411e] "machine-2" cannot open api: connection permanently impossible
stack trace:
github.com/juju/juju/worker/apicaller.init:42: connection permanently impossible
github.com/juju/juju/cmd/jujud/agent/machine.commonManifolds.Manifold.ManifoldConfig.startFunc.func35:97: [f7411e] "machine-2" cannot open api
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-flag" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-gate" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "syslog" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "termination-signal-handler" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "upgrade-check-flag" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "agent" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "charmhub-http-client" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "clock" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "upgrade-check-gate" manifold worker completed successfully
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "api-config-watcher" manifold worker completed successfully
2025-01-11 05:43:23 INFO juju.worker.stateconfigwatcher manifold.go:120 tomb dying
2025-01-11 05:43:23 DEBUG juju.worker.dependency engine.go:603 "state-config-watcher" manifold worker completed successfully
2025-01-11 05:43:23 INFO juju.cmd.jujud runner.go:623 stopped "engine", err: agent should be terminated
2025-01-11 05:43:23 DEBUG juju.cmd.jujud runner.go:430 "engine" done: agent should be terminated
2025-01-11 05:43:23 DEBUG juju.cmd.jujud runner.go:485 error "engine": agent should be terminated
2025-01-11 05:43:23 DEBUG juju.cmd.jujud.agent.addons addons.go:77 engine stopped, stopping introspection
2025-01-11 05:43:23 ERROR juju.cmd.jujud runner.go:487 fatal error "engine": agent should be terminated
2025-01-11 05:43:23 DEBUG juju.worker.introspection worker.go:159 stats worker closing listener
2025-01-11 05:43:23 INFO cmd supercommand.go:556 command finished
2025-01-11 05:43:23 DEBUG juju.worker.introspection worker.go:163 stats worker finished
2025-01-11 05:43:23 DEBUG juju.cmd.jujud.agent.addons addons.go:80 introspection stopped
2025-01-11 05:43:23 DEBUG juju.cmd.jujud main.go:286 jujud complete, code 0, err <nil>
Hmm, looks like the failing machines are connecting correctly to the controllers API, but being denied access, either because the controller has no record of the the machine (identified by its number), or because the machine is trying to access the controller with a bad password.
According to the status it seems the controller knows about the machines, so I can only assume that somehow they have ended up with a bad password.
To confirm this, would you be able to check if the apipassword and oldpassword in the file /var/lib/juju/agents/unit<appname>-<unit-num>/agent.conf
for the broken units differ from the passwords in the working units? (though don’t post them because they could be sensative info!)
@aflynn thanks. I have actually replaced the failing machines but next time it happens I’ll refer here to test if the password could be the issue.
Fair, that is certainly the best work around.