Charmed Kubeflow Upgrade Error

Hi there,

I’ve just upgraded from Charmed Kubeflow 1.4 to 1.6 and there are no longer any ports exposed on istio-ingressgateway.

I’m yet to upgrade the Microk8s cluster and it’s still running on 1.21.

Training-operator and tensorboard-controller are also stuck waiting for information: Model Controller Cloud/Region Version SLA Timestamp kubeflow microk8s-localhost microk8s/localhost 2.9.33 unsupported 15:18:07Z

App Version Status Scale Charm Channel Rev Address Exposed Message admission-webhook res:oci-image@84a4d7d active 1 admission-webhook 1.6/stable 50 no argo-controller res:oci-image@669ebd5 active 1 argo-controller 3.3/stable 99 no argo-server res:oci-image@576d038 active 1 argo-server 3.3/stable 45 no dex-auth active 1 dex-auth 2.31/stable 129 no envoy res:oci-image@b4adee5 active 1 envoy 1.12/stable 11 no istio-ingressgateway active 1 istio-gateway 1.11/stable 114 no istio-pilot active 1 istio-pilot 1.11/stable 131 no jupyter-controller res:oci-image@8f4ec33 active 1 jupyter-controller 1.6/stable 138 no jupyter-ui res:oci-image@cde6632 active 1 jupyter-ui 1.6/stable 99 no katib-controller res:oci-image@03d47fb active 1 katib-controller 0.14/stable 92 no katib-db mariadb/server:10.3 active 1 charmed-osm-mariadb-k8s stable 35 no ready katib-db-manager res:oci-image@16b33a5 active 1 katib-db-manager 0.14/stable 66 no katib-ui res:oci-image@c7dc04a active 1 katib-ui 0.14/stable 90 no kfp-api res:oci-image@1b44753 active 1 kfp-api 2.0/stable 81 no kfp-db mariadb/server:10.3 active 1 charmed-osm-mariadb-k8s stable 35 no ready kfp-persistence res:oci-image@31f08ad active 1 kfp-persistence 2.0/stable 76 no kfp-profile-controller res:oci-image@d86ecff active 1 kfp-profile-controller 2.0/stable 61 no kfp-schedwf res:oci-image@51ffc60 active 1 kfp-schedwf 2.0/stable 80 no kfp-ui res:oci-image@55148fd active 1 kfp-ui 2.0/stable 80 no kfp-viewer res:oci-image@7190aa3 active 1 kfp-viewer 2.0/stable 79 no kfp-viz res:oci-image@67e8b09 active 1 kfp-viz 2.0/stable 74 no kubeflow-dashboard res:oci-image@6fe6eec active 1 kubeflow-dashboard 1.6/stable 154 no kubeflow-profiles res:profile-image@0a46ffc active 1 kubeflow-profiles 1.6/stable 82 no kubeflow-roles active 1 kubeflow-roles 1.6/stable 31 no kubeflow-volumes res:oci-image@cc5177a active 1 kubeflow-volumes 1.6/stable 64 no metacontroller-operator active 1 metacontroller-operator 2.0/stable 48 no minio res:oci-image@1755999 active 1 minio stable 57 no mlmd res:oci-image@e2cb9ce active 1 mlmd 1.0/stable 14 no oidc-gatekeeper res:oci-image@32de216 active 1 oidc-gatekeeper ckf-1.6/stable 76 no seldon-controller-manager res:oci-image@eb811b6 active 1 seldon-core 1.14/stable 92 no tensorboard-controller res:oci-image@0f8c7de waiting 1 tensorboard-controller 1.6/stable 56 no Waiting for gateway info relation tensorboards-web-app res:oci-image@914a8ab active 1 tensorboards-web-app 1.6/stable 57 no training-operator waiting 1 training-operator 1.5/stable 65 no waiting for units settled down

Unit Workload Agent Address Ports Message admission-webhook/2* active idle 4443/TCP argo-controller/3* active idle argo-server/2* active idle 2746/TCP dex-auth/0* active idle envoy/0* active idle 9901/TCP,9090/TCP istio-ingressgateway/0* active idle istio-pilot/0* active idle jupyter-controller/3* active idle jupyter-ui/2* active idle 5000/TCP katib-controller/3* active idle 443/TCP,8080/TCP katib-db-manager/2* active idle 6789/TCP katib-db/0* active idle 3306/TCP ready katib-ui/2* active idle 8080/TCP kfp-api/3* active idle 8888/TCP,8887/TCP kfp-db/0* active idle 3306/TCP ready kfp-persistence/3* active idle kfp-profile-controller/3* active idle 80/TCP kfp-schedwf/2* active idle kfp-ui/3* active idle 3000/TCP kfp-viewer/2* active idle kfp-viz/3* active idle 8888/TCP kubeflow-dashboard/3* active idle 8082/TCP

Please could someone advise me on how to get the ports exposed again? I’ve tried redeploying the components multiple times but with no luck.

Thanks, Ollie

Update: My bad I didn’t check the load balancer IP. I am able to access dex now.

Glad you sorted it out! Thanks


Just to say I redeployed again and was unable to access the dashboard.

It turns out the issue is the gateway was not created.

I’ve seen on the blog post it says this is rare but it has happened every time I deployed on fresh VMs with Microk8s in cluster mode on Ubuntu 20.04.

Perhaps this might be helpful in identifying when the issue occurs @ca-scribner


Hi @ollienuk,

Sorry you’re stuck on that. We thought we fixed that here but sounds like no. That pull links to an Issue that has some more ideas. Just confirming - does that look like the problems you’re hitting?

@ca-scribner No worries, I was able to fix it with the following workaround: juju run --unit istio-pilot/0 – “export JUJU_DISPATCH_PATH=hooks/config-changed; ./dispatch”

The issue you’ve linked to is definitely the issue I’ve been having.