I started in parallel the exact same VM and the fresh VM is much more responsive.
$ juju --debug status
12:19:50 INFO juju.cmd supercommand.go:56 running juju [3.6.2 87cae7505aee356eda90d98ae345e1c11eb26c72 gc go1.23.4]
12:19:50 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/29493/bin/juju", "--debug", "status"}
12:19:50 INFO juju.juju api.go:86 connecting to API addresses: [10.152.183.92:17070]
12:19:50 INFO juju.kubernetes.klog klog.go:113 Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.%!(EXTRA []interface {}=[])
12:19:50 DEBUG juju.api apiclient.go:508 starting proxier for connection
12:19:50 DEBUG juju.api apiclient.go:512 tunnel proxy in use at localhost on port 44069
12:19:50 DEBUG juju.api apiclient.go:1035 successfully dialed "wss://localhost:44069/model/672b5425-1660-435f-8d33-646f863f7f69/api"
12:19:50 INFO juju.api apiclient.go:570 connection established to "wss://localhost:44069/model/672b5425-1660-435f-8d33-646f863f7f69/api"
Model Controller Cloud/Region Version SLA Timestamp
microk8s-model microk8s-controller microk8s/localhost 3.6.2 unsupported 12:19:50+01:00
App Version Status Scale Charm Channel Rev Address Exposed Message
alertmanager 0.27.0 active 1 alertmanager-k8s latest/stable 144 10.152.183.148 no
catalogue active 1 catalogue-k8s latest/stable 79 10.152.183.124 no
grafana 9.5.3 active 1 grafana-k8s latest/stable 128 10.152.183.230 no
loki 2.9.6 active 1 loki-k8s latest/stable 184 10.152.183.218 no
prometheus 2.52.0 active 1 prometheus-k8s latest/stable 226 10.152.183.233 no
traefik 2.11.0 active 1 traefik-k8s latest/stable 226 10.152.183.211 no Serving at 10.157.61.149
Unit Workload Agent Address Ports Message
alertmanager/0* active idle 10.1.32.130
catalogue/0* active idle 10.1.32.181
grafana/0* active executing 10.1.32.132
loki/0* active executing 10.1.32.131
prometheus/0* active executing 10.1.32.134
traefik/0* active idle 10.1.32.129 Serving at 10.157.61.149
12:19:50 DEBUG juju.api monitor.go:35 RPC connection died
12:19:50 INFO cmd supercommand.go:556 command finished
It appears that over time my previous VM got slowed down. Is there anything that accumulate or could slow down over time a juju k8s deployment?
Hey @gbeuzeboc, I’m not aware of anything in a deployment like this that would cause a slowdown such as the one you’re seeing. 8GB of RAM seems quite low to me for a deployment of this scale, do you see the same issues in a VM of 16GB? Same applies for the processor, it is also fairly low powered for a deployment of this size.
When status is running, what is at the top of htop?
A fresh VM with the exact same deployment runs fine and juju is responsive. I suspect more that something accumulates (I am removing an application and redeploying it for tests).
When running juju status (or even when not running it) the main process is:
kubelite with an average of 40% CPU and 1% MEM.
I’ve had a similar issue to this in the past after suspending a VM. I don’t think I ever tracked down exactly what the cause was, but if I remember correctly a qemu process was very busy.
how are you removing? with force? because if that’s the case, it may be that things are not really being cleaned up in the background.
perhaps worth digging through lxc list, kubectl, and ps -aux to look for zombie VMs, pods, and processes.
I am removing with juju remove-application app-name and not using --force.
I have been suspending my computer with the running VM a lot and also suspending the VM independently. The VM itself is responsive, it’s just juju related commands that are all slowed down.
It seems that the issue resides in microk8s. Any microk8s command seems timeout most of the time:
time microk8s.kubectl get pods -A
E0214 17:20:13.798554 1425426 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://127.0.0.1:16443/api?timeout=32s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
E0214 17:21:16.890549 1425426 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://127.0.0.1:16443/api?timeout=32s\": context deadline exceeded"
I0214 17:21:28.176278 1425426 request.go:700] Waited for 1.803346428s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:16443/api?timeout=32s
E0214 17:21:53.324949 1425426 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://127.0.0.1:16443/api?timeout=32s\": net/http: TLS handshake timeout"
E0214 17:22:04.425500 1425426 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://127.0.0.1:16443/api?timeout=32s\": net/http: TLS handshake timeout"
E0214 17:22:21.702491 1425426 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://127.0.0.1:16443/api?timeout=32s\": net/http: TLS handshake timeout"
Unable to connect to the server: net/http: TLS handshake timeout
real 9m36.305s
user 0m1.663s
sys 2m12.600s
The juju models command seems to list way fewer models than the microk8s.kubectl get pods -A command. I have a lot of kube-system hostpath-provisioner-cos-* marked completed but present. I am not sure what they are and if I should delete them.