Hi, I’m having some issues with juju 2.8.10 which it’s not getting the LB ingress-address some times. I want to know the basics of this feature to be able to troubleshoot and know if it’s a bug in mi charms or a bug on juju or ops.
the only thing that may be not usual is defining the service in the pod spec
metadata for the service:
Ok I firuged out how to do it with loadbalancer in the metadata, but still don’t know why it was only working sometimes with the method previously mentioned…
Thanks for the awesome question about Juju. I am going to dig into what happens a little bit more and get back to you with a more in depth answer. My initial thoughts are:
The defined service outside of Juju doesn’t contain the selectors need to match against the workload pods. The service itself is also missing the required selectors for Juju to be able to associate the service with the app deployment.
Our preferred way would be to expose the workload using Juju primitives as mentioned above.
I am a little surprised that the out of band service is providing information back to Juju as it stands even if irregular. If you have the time would you mind sharing a screen shot or similar of how you are seeing this information so I can make sure my thinking is on the same page ?
It seems that its not happening only when the service is set as omit, the following capture is from a charm with service: loadbalancer in the metadata.yaml.
in the following example there is a applications that has collected the address successfully and the other hasn’t
juju status:
Model Controller Cloud/Region Version SLA Timestamp
testcharm k3s k3s 2.8.10 unsupported 16:20:18Z
App Version Status Scale Charm Store Channel Rev OS Address Message
db db active 1 db local 13 kubernetes None
scscf scscf active 1 scscf local 3 kubernetes 192.168.19.20
Unit Workload Agent Address Ports Message
db/7* active idle 10.42.0.121 5432/TCP ready
scscf/2* active idle 10.42.0.152 6060/UDP ready
the description of the service in kubernetes of db:
Name: db
Namespace: testcharm
Labels: juju-app=db
Annotations: field.cattle.io/publicEndpoints:
[{"addresses":["192.168.19.12"],"port":5432,"protocol":"TCP","serviceName":"testcharm:db","allNodes":false}]
juju.io/controller: 6007d0d0-3b4c-45e7-8a77-50e4fb372d5c
juju.io/model: 02845b9c-f27e-42ae-8582-6639cc7f92b0
Selector: juju-app=db
Type: LoadBalancer
IP: 10.43.249.152
LoadBalancer Ingress: 192.168.19.12
Port: db 5432/TCP
TargetPort: 5432/TCP
NodePort: db 31828/TCP
Endpoints: 10.42.0.121:5432
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 23m metallb-controller Assigned IP "192.168.19.12"
Normal nodeAssigned 80s (x29 over 23m) metallb-speaker announcing from node "k3s"
the description of the service in kubernetes of scscf:
Name: scscf
Namespace: testcharm
Labels: juju-app=scscf
Annotations: field.cattle.io/publicEndpoints:
[{"addresses":["192.168.19.20"],"port":6060,"protocol":"UDP","serviceName":"testcharm:scscf","allNodes":false}]
juju.io/controller: 6007d0d0-3b4c-45e7-8a77-50e4fb372d5c
juju.io/model: 02845b9c-f27e-42ae-8582-6639cc7f92b0
Selector: juju-app=scscf
Type: LoadBalancer
IP: 10.43.136.65
LoadBalancer Ingress: 192.168.19.20
Port: sip 6060/UDP
TargetPort: 6060/UDP
NodePort: sip 30930/UDP
Endpoints: 10.42.0.152:6060
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 22m metallb-controller Assigned IP "192.168.19.20"
Normal nodeAssigned 100s (x37 over 21m) metallb-speaker announcing from node "k3s"
*there are more applications with none in the address and only the scscf reports the address, I just filtered the output to see it in a clearer way
Hi I created a bundle to reproduce this behaviour with 15 aplications with a loadbalancer per application if someone want to take a look.
https://gitlab.com/endikap100/charmbugloadbalancernone
In this example 4 aplications report the LoadBalancer ip as None, 1 is reported as a blank space and the other 10 reported the correct IP.
juju status:
Model Controller Cloud/Region Version SLA Timestamp
testlb microk8scluster-localhost microk8scluster/localhost 2.9.0 unsupported 12:13:17Z
App Version Status Scale Charm Store Channel Rev OS Address Message
nginx1 nginx active 1 nginx local 3 kubernetes None
nginx2 nginx active 1 nginx local 3 kubernetes 192.168.19.53
nginx3 nginx active 1 nginx local 3 kubernetes
nginx4 nginx active 1 nginx local 3 kubernetes None
nginx5 nginx active 1 nginx local 3 kubernetes 192.168.19.58
nginx6 nginx active 1 nginx local 3 kubernetes 192.168.19.59
nginx7 nginx active 1 nginx local 3 kubernetes 192.168.19.60
nginx8 nginx active 1 nginx local 3 kubernetes 192.168.19.56
nginx9 nginx active 1 nginx local 3 kubernetes 192.168.19.57
nginx10 nginx active 1 nginx local 3 kubernetes None
nginx11 nginx active 1 nginx local 3 kubernetes 192.168.19.64
nginx12 nginx active 1 nginx local 3 kubernetes 192.168.19.63
nginx13 nginx active 1 nginx local 3 kubernetes None
nginx14 nginx active 1 nginx local 3 kubernetes 192.168.19.65
nginx15 nginx active 1 nginx local 3 kubernetes 192.168.19.66
Unit Workload Agent Address Ports Message
nginx1/3* active idle 10.1.128.249 80/TCP ready
nginx2/3* active idle 10.1.128.195 80/TCP ready
nginx3/3* active idle 10.1.128.232 80/TCP ready
nginx4/3* active idle 10.1.128.193 80/TCP ready
nginx5/3* active idle 10.1.128.208 80/TCP ready
nginx6/3* active idle 10.1.128.240 80/TCP ready
nginx7/3* active idle 10.1.128.247 80/TCP ready
nginx8/3* active idle 10.1.128.203 80/TCP ready
nginx9/3* active idle 10.1.128.236 80/TCP ready
nginx10/3* active idle 10.1.128.254 80/TCP ready
nginx11/3* active idle 10.1.129.0 80/TCP ready
nginx12/3* active idle 10.1.128.239 80/TCP ready
nginx13/3* active idle 10.1.128.235 80/TCP ready
nginx14/3* active idle 10.1.129.1 80/TCP ready
nginx15/3* active idle 10.1.129.2 80/TCP ready
The way Juju collects the application address to show in Juju status is that it sets up a k8s watcher to trigger whenever any of the services associated with the deployed applications are updated. Whenever such an update occurs in the k8s cluster, Juju will read the IP address associated with the k8s service store it in the model, so that it is reported by juju status.
For load balancer services, Juju looks at either the service spec LoadBalancerIP value or the service status load balancer ingress value, whichever one is set.
Can you kubectl describe one of the services where juju doesn’t report the address properly?
eg kubectl get service/nginx3 -o json | jq ".status, .spec"
And see what the loadbalancer values are? Maybe there’s something juju is missing.
Also look in the Juju logs to see if there are any errors reporting and/or setting the services addresses in the model.
We have seen before that the watcher set up to monitor the state of the k8s cluster can get disconnected, but using the k8s shared informer implementation as we now do should fix that.
So as an experiment I ran up the example bundle on k3s without configuring any additional ingress/loadbalancer infrastructure, and the service info as reported by kubectl is like this:
Because the service is type Loadbalancer, juju only looks for loadbalancer addresses. It currently won’t notice the clusterIP ones.
Could you check whether the few services which do not have an IP address in Juju are the same as the above? If so, that explains why Juju isn’t seeing the addresses, but not why some k8s services are getting addresses and others aren’t.
in the logs of the controller I have found this, but it is before applying the spec of the charm
INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: application-nginx1 for ade85e6b-e0b2-44e9-8022-cf6d4ce5d2fe
and this one:
INFO juju.worker.dependency engine.go:671 “caas-unit-provisioner” manifold worker returned unexpected error: cannot add cloud service “ea5221a4-982f-4e06-8ae2-193fc05ee238”: failed to save cloud service: state changing too quickly; try again soon
maybe juju it’s trying to get the IP of the LoadBalancer when metallb hasn’t set one yet. I’m not sure if the clusterIP and the LoadBalancer ip are set at the same time, or if the clusterIP is set first
Quick reply for now, the above is a problem. it means the service address info with the address is not being saved to the Juju model. We’ll need to dig into the logs to identify a cause. If possible a db dump would be great also:
I finally managed to get to the root cause of this issue. Juju creates a headless, clusterIP service for each app. The clusterIP address for those services is None. The query Juju uses to look at the k8s cluster to get service address information could mistakenly result in the headless service being used, which is where the None comes from in status.
We’ll do a fix for Juju 2.8.11 (and 2.9.x) which we hope to release soon.