Another semi failed Kubernetes environment. Environment itself seems to function, however Juju unit agents reporting lost connectivity from one agent to another. There is no pattern. On some hosts ntp subunit failed, on others nrpe. Some have had main unit agent connectivity failure.
Did we stumble upon some bug?
All of them reporting the same:
agent lost, see 'juju show-status-log AGENT/NUMBER'
When checking agent.conf file I see missing apiaddresses entry, some passwords too:
controller: controller-2fa9c671-d95b-4ad5-8843-5e314f3f7706
model: model-89848811-0bdb-4db8-8e31-501824f7bd23
apipassword: ****
oldpassword: ****
loggingconfig: <root>=INFO;unit=DEBUG
values:
CONTAINER_TYPE: ""
NAMESPACE: ""
mongoversion: "0.0"
On a working unit I see Juju API endpoint as expected:
controller: controller-2fa9c671-d95b-4ad5-8843-5e314f3f7706
model: model-89848811-0bdb-4db8-8e31-501824f7bd23
apiaddresses:
- a.b.c.d:17070
apipassword: ***
oldpassword: ***
loggingconfig: <root>=INFO;unit=DEBUG
values:
CONTAINER_TYPE: ""
NAMESPACE: ""
mongoversion: "0.0"
One thing in common. When looking at the agent log file, I see that on April16 there was a connectivity failure and Juju upgrade(?):
2020-04-16 12:43:21 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: unable to connect to API: dial tcp a.b.c.d:17070: connect: network is unreachable
2020-04-16 12:45:28 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: unable to connect to API: dial tcp a.b.c.d:17070: connect: network is unreachable
2020-04-16 12:47:26 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: unable to connect to API: dial tcp a.b.c.d:17070: connect: network is unreachable
2020-04-16 12:49:21 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: unable to connect to API: dial tcp a.b.c.d:17070: connect: network is unreachable
2020-04-16 12:51:20 INFO juju.api apiclient.go:624 connection established to "wss://a.b.c.d:17070/model/89848811-0bdb-4db8-8e31-501824f7bd23/api"
2020-04-16 12:51:20 INFO juju.worker.apicaller connect.go:158 [898488] "unit-calico-8" successfully connected to "a.b.c.d:17070"
2020-04-16 12:51:20 INFO juju.worker.migrationminion worker.go:139 migration phase is now: NONE
2020-04-16 12:51:20 INFO juju.worker.logger logger.go:118 logger worker started
2020-04-16 12:51:20 INFO juju.worker.upgrader upgrader.go:155 abort check blocked until version event received
2020-04-16 12:51:20 INFO juju.worker.upgrader upgrader.go:161 unblocking abort check
2020-04-16 12:51:20 INFO juju.worker.upgrader upgrader.go:194 desired agent binary version: 2.7.5
2020-04-16 12:51:20 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-calico-8
2020-04-16 12:51:20 INFO juju.agent.tools symlinks.go:40 was a symlink, now looking at /var/lib/juju/tools/2.7.5-bionic-amd64
2020-04-16 12:51:20 INFO juju.worker.uniter.relation relations.go:553 joining relation "calico:etcd etcd:db"
2020-04-16 12:51:20 INFO juju.worker.leadership tracker.go:194 calico/8 promoted to leadership of calico
2020-04-16 12:51:20 INFO juju.worker.uniter.relation relations.go:589 joined relation "calico:etcd etcd:db"
2020-04-16 12:51:20 INFO juju.worker.uniter.relation relations.go:553 joining relation "calico:cni kubernetes-master:cni"
2020-04-16 12:51:20 INFO juju.worker.logger logger.go:134 logger worker stopped
2020-04-16 12:51:20 ERROR juju.worker.uniter.relation relations.go:568 while stopping unit watcher: connection is shut down
2020-04-16 12:51:20 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: validating info for opening an API connection: missing addresses not valid
2020-04-16 12:51:20 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: validating info for opening an API connection: missing addresses not valid
2020-04-16 12:51:24 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [898488] "unit-calico-8" cannot open api: validating info for opening an API connection: missing addresses not valid
Similar upgrade(?) lines seen on April 02, but there was no errors and no connectivity issues.
Where to poke next? And what possible ways to fix missing apiaddress.
Just adding api address does seem to help, but at least one unit file was missing passwords too and copying passwords from working host, obviously failed. But still it is not a nice thing to loose API connectivity.