How does juju collect ingress-address?

Hi, I’m having some issues with juju 2.8.10 which it’s not getting the LB ingress-address some times. I want to know the basics of this feature to be able to troubleshoot and know if it’s a bug in mi charms or a bug on juju or ops.
the only thing that may be not usual is defining the service in the pod spec
metadata for the service:

deployment:
type: stateful
service: omit

podspec:

ports = [{"name": self.framework.model.app.name, "containerPort": config["port"], "protocol": "TCP"}]
spec = {
    "version": 3,
    "containers": [
        {
            "name": self.framework.model.app.name,
            "image": config["image"],
            "ports": ports,
            "envConfig": {
                "POSTGRES_PASSWORD": config["password"],
            },
        },
    ],
    "kubernetesResources": {
        "services": [
            {"name": self.framework.model.app.name,
             "spec": {
                 "selector": {
                     "app": self.framework.model.app.name
                 },
                 "ports": [
                     {"name": "db",
                      "protocol": "TCP",
                      "port": config["port"],
                      "targetPort": config["port"]}
                 ],
                 "type": "LoadBalancer"
             }
             }
        ]
    }
}

the service is created in kubernetes but juju doesn’t get its ip a lot of times

by the way, the reason for defining the service in the podspec is that I have to setup the port name to be able to make SRV dns querys

Ok I firuged out how to do it with loadbalancer in the metadata, but still don’t know why it was only working sometimes with the method previously mentioned…

Hey Endika,

Thanks for the awesome question about Juju. I am going to dig into what happens a little bit more and get back to you with a more in depth answer. My initial thoughts are:

  • The defined service outside of Juju doesn’t contain the selectors need to match against the workload pods. The service itself is also missing the required selectors for Juju to be able to associate the service with the app deployment.

Our preferred way would be to expose the workload using Juju primitives as mentioned above.

I am a little surprised that the out of band service is providing information back to Juju as it stands even if irregular. If you have the time would you mind sharing a screen shot or similar of how you are seeing this information so I can make sure my thinking is on the same page ?

Cheers
tlm

It seems that its not happening only when the service is set as omit, the following capture is from a charm with service: loadbalancer in the metadata.yaml.
in the following example there is a applications that has collected the address successfully and the other hasn’t
juju status:

 Model      Controller  Cloud/Region  Version  SLA          Timestamp
 testcharm  k3s         k3s           2.8.10   unsupported  16:20:18Z
 App         Version      Status   Scale  Charm       Store  Channel  Rev  OS          Address        Message
 db          db                 active       1        db              local                     13  kubernetes  None
 scscf      scscf            active       1  scscf               local                       3  kubernetes  192.168.19.20

 Unit           Workload  Agent  Address         Ports     Message
 db/7*          active    idle   10.42.0.121     5432/TCP  ready
 scscf/2*       active    idle   10.42.0.152     6060/UDP  ready

the description of the service in kubernetes of db:

Name:                     db
Namespace:                testcharm
Labels:                   juju-app=db
Annotations:              field.cattle.io/publicEndpoints:
                            [{"addresses":["192.168.19.12"],"port":5432,"protocol":"TCP","serviceName":"testcharm:db","allNodes":false}]
                          juju.io/controller: 6007d0d0-3b4c-45e7-8a77-50e4fb372d5c
                          juju.io/model: 02845b9c-f27e-42ae-8582-6639cc7f92b0
Selector:                 juju-app=db
Type:                     LoadBalancer
IP:                       10.43.249.152
LoadBalancer Ingress:     192.168.19.12
Port:                     db  5432/TCP
TargetPort:               5432/TCP
NodePort:                 db  31828/TCP
Endpoints:                10.42.0.121:5432
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason        Age                 From                Message
  ----    ------        ----                ----                -------
  Normal  IPAllocated   23m                 metallb-controller  Assigned IP "192.168.19.12"
  Normal  nodeAssigned  80s (x29 over 23m)  metallb-speaker     announcing from node "k3s"

the description of the service in kubernetes of scscf:

 Name:                     scscf
 Namespace:                testcharm
 Labels:                   juju-app=scscf
 Annotations:              field.cattle.io/publicEndpoints:
                        [{"addresses":["192.168.19.20"],"port":6060,"protocol":"UDP","serviceName":"testcharm:scscf","allNodes":false}]
                      juju.io/controller: 6007d0d0-3b4c-45e7-8a77-50e4fb372d5c
                      juju.io/model: 02845b9c-f27e-42ae-8582-6639cc7f92b0
 Selector:                 juju-app=scscf
 Type:                     LoadBalancer
 IP:                       10.43.136.65
 LoadBalancer Ingress:     192.168.19.20
 Port:                     sip  6060/UDP
 TargetPort:               6060/UDP
 NodePort:                 sip  30930/UDP
 Endpoints:                10.42.0.152:6060
 Session Affinity:         None
 External Traffic Policy:  Cluster
 Events:
   Type    Reason        Age                  From                Message
   ----    ------        ----                 ----                -------
   Normal  IPAllocated   22m                  metallb-controller  Assigned IP "192.168.19.20"
   Normal  nodeAssigned  100s (x37 over 21m)  metallb-speaker     announcing from node "k3s"

*there are more applications with none in the address and only the scscf reports the address, I just filtered the output to see it in a clearer way

@tlm any thoughts on that?

this log line of the controller may be related

2021-05-10 11:21:37 ERROR juju.apiserver.uniter networkinfo.go:288 resolving “”: lookup : no such host

Hi I created a bundle to reproduce this behaviour with 15 aplications with a loadbalancer per application if someone want to take a look.
https://gitlab.com/endikap100/charmbugloadbalancernone
In this example 4 aplications report the LoadBalancer ip as None, 1 is reported as a blank space and the other 10 reported the correct IP.

juju status:

Model Controller Cloud/Region Version SLA Timestamp
testlb microk8scluster-localhost microk8scluster/localhost 2.9.0 unsupported 12:13:17Z

App Version Status Scale Charm Store Channel Rev OS Address Message
nginx1 nginx active 1 nginx local 3 kubernetes None
nginx2 nginx active 1 nginx local 3 kubernetes 192.168.19.53
nginx3 nginx active 1 nginx local 3 kubernetes
nginx4 nginx active 1 nginx local 3 kubernetes None
nginx5 nginx active 1 nginx local 3 kubernetes 192.168.19.58
nginx6 nginx active 1 nginx local 3 kubernetes 192.168.19.59
nginx7 nginx active 1 nginx local 3 kubernetes 192.168.19.60
nginx8 nginx active 1 nginx local 3 kubernetes 192.168.19.56
nginx9 nginx active 1 nginx local 3 kubernetes 192.168.19.57
nginx10 nginx active 1 nginx local 3 kubernetes None
nginx11 nginx active 1 nginx local 3 kubernetes 192.168.19.64
nginx12 nginx active 1 nginx local 3 kubernetes 192.168.19.63
nginx13 nginx active 1 nginx local 3 kubernetes None
nginx14 nginx active 1 nginx local 3 kubernetes 192.168.19.65
nginx15 nginx active 1 nginx local 3 kubernetes 192.168.19.66

Unit Workload Agent Address Ports Message
nginx1/3* active idle 10.1.128.249 80/TCP ready
nginx2/3* active idle 10.1.128.195 80/TCP ready
nginx3/3* active idle 10.1.128.232 80/TCP ready
nginx4/3* active idle 10.1.128.193 80/TCP ready
nginx5/3* active idle 10.1.128.208 80/TCP ready
nginx6/3* active idle 10.1.128.240 80/TCP ready
nginx7/3* active idle 10.1.128.247 80/TCP ready
nginx8/3* active idle 10.1.128.203 80/TCP ready
nginx9/3* active idle 10.1.128.236 80/TCP ready
nginx10/3* active idle 10.1.128.254 80/TCP ready
nginx11/3* active idle 10.1.129.0 80/TCP ready
nginx12/3* active idle 10.1.128.239 80/TCP ready
nginx13/3* active idle 10.1.128.235 80/TCP ready
nginx14/3* active idle 10.1.129.1 80/TCP ready
nginx15/3* active idle 10.1.129.2 80/TCP ready

kubernetes services:

nginx1 LoadBalancer 10.152.183.234 192.168.19.52 80:31751/TCP 74m
nginx12 LoadBalancer 10.152.183.142 192.168.19.63 80:30873/TCP 70m
nginx5 LoadBalancer 10.152.183.58 192.168.19.58 80:31654/TCP 72m
nginx6 LoadBalancer 10.152.183.151 192.168.19.59 80:32209/TCP 72m
nginx15 LoadBalancer 10.152.183.169 192.168.19.66 80:32304/TCP 69m
nginx8 LoadBalancer 10.152.183.96 192.168.19.56 80:30633/TCP 72m
nginx11 LoadBalancer 10.152.183.71 192.168.19.64 80:31543/TCP 70m
nginx4 LoadBalancer 10.152.183.70 192.168.19.54 80:30252/TCP 72m
nginx13 LoadBalancer 10.152.183.171 192.168.19.62 80:32412/TCP 70m
nginx2 LoadBalancer 10.152.183.248 192.168.19.53 80:32369/TCP 73m
nginx3 LoadBalancer 10.152.183.22 192.168.19.55 80:30775/TCP 72m
nginx7 LoadBalancer 10.152.183.107 192.168.19.60 80:30897/TCP 72m
nginx10 LoadBalancer 10.152.183.146 192.168.19.61 80:30774/TCP 71m
nginx9 LoadBalancer 10.152.183.61 192.168.19.57 80:31811/TCP 72m
nginx14 LoadBalancer 10.152.183.43 192.168.19.65 80:31579/TCP 69m

The way Juju collects the application address to show in Juju status is that it sets up a k8s watcher to trigger whenever any of the services associated with the deployed applications are updated. Whenever such an update occurs in the k8s cluster, Juju will read the IP address associated with the k8s service store it in the model, so that it is reported by juju status.

For load balancer services, Juju looks at either the service spec LoadBalancerIP value or the service status load balancer ingress value, whichever one is set.

Can you kubectl describe one of the services where juju doesn’t report the address properly?
eg
kubectl get service/nginx3 -o json | jq ".status, .spec"

And see what the loadbalancer values are? Maybe there’s something juju is missing.

Also look in the Juju logs to see if there are any errors reporting and/or setting the services addresses in the model.

We have seen before that the watcher set up to monitor the state of the k8s cluster can get disconnected, but using the k8s shared informer implementation as we now do should fix that.

1 Like

So as an experiment I ran up the example bundle on k3s without configuring any additional ingress/loadbalancer infrastructure, and the service info as reported by kubectl is like this:

$ k3s kubectl -n test get service/nginx1 -o json | jq ".spec, .status"
{
  "clusterIP": "10.43.7.92",
  "clusterIPs": [
    "10.43.7.92"
  ],
  "externalTrafficPolicy": "Cluster",
  "ports": [
    {
      "name": "nginx1",
      "nodePort": 31986,
      "port": 80,
      "protocol": "TCP",
      "targetPort": 80
    }
  ],
  "selector": {
    "app.kubernetes.io/name": "nginx1"
  },
  "sessionAffinity": "None",
  "type": "LoadBalancer"
}
{
  "loadBalancer": {}
}

Because the service is type Loadbalancer, juju only looks for loadbalancer addresses. It currently won’t notice the clusterIP ones.

Could you check whether the few services which do not have an IP address in Juju are the same as the above? If so, that explains why Juju isn’t seeing the addresses, but not why some k8s services are getting addresses and others aren’t.

1 Like

thanks for the reply, the json of the services that work have the same info as the ones that don’t.
this time only one has report none as addres.

 kubectl get service nginx1 -n nginx -o json | jq ".spec, .status"
 {
   "clusterIP": "10.43.178.32",
   "externalTrafficPolicy": "Cluster",
   "ports": [
     {
       "name": "nginx1",
       "nodePort": 30614,
       "port": 80,
       "protocol": "TCP",
       "targetPort": 80
     }
   ],
   "selector": {
     "juju-app": "nginx1"
   },
   "sessionAffinity": "None",
   "type": "LoadBalancer"
 }
 {
   "loadBalancer": {
     "ingress": [
       {
         "ip": "192.168.19.13"
       }
     ]
   }
 }

in the logs of the controller I have found this, but it is before applying the spec of the charm

INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: application-nginx1 for ade85e6b-e0b2-44e9-8022-cf6d4ce5d2fe

and this one:

INFO juju.worker.dependency engine.go:671 “caas-unit-provisioner” manifold worker returned unexpected error: cannot add cloud service “ea5221a4-982f-4e06-8ae2-193fc05ee238”: failed to save cloud service: state changing too quickly; try again soon

you can find the full log in https://gitlab.com/endikap100/charmbugloadbalancernone/-/blob/master/jujucontrolerlogs.txt

1 Like

maybe juju it’s trying to get the IP of the LoadBalancer when metallb hasn’t set one yet. I’m not sure if the clusterIP and the LoadBalancer ip are set at the same time, or if the clusterIP is set first

Quick reply for now, the above is a problem. it means the service address info with the address is not being saved to the Juju model. We’ll need to dig into the logs to identify a cause. If possible a db dump would be great also:

$ export JUJU_DEV_FEATURE_FLAGS=developer-mode
$ juju dump-db

That will show the content of the db and hopefully help show why the service info is not being saved.

1 Like

the one that failed has addresstype hostname instead of ipv4
gitlab.com/endikap100/charmbugloadbalancernone/-/blob/master/juju_db.txt

I finally managed to get to the root cause of this issue. Juju creates a headless, clusterIP service for each app. The clusterIP address for those services is None. The query Juju uses to look at the k8s cluster to get service address information could mistakenly result in the headless service being used, which is where the None comes from in status.

We’ll do a fix for Juju 2.8.11 (and 2.9.x) which we hope to release soon.

2 Likes

Thanks @wallyworld for looking into this.

I just tested 2.9.1 and it works fine, thanks!!!