"charm missing from disk" on istio-pilot pod on kubeflow 1.6/beta

Hello, I’m trying to install charmed kubeflow 1.6/beta on OKD 4.9.

After I deployed the kubeflow bundle with juju, it looks like istio-pilot apps has some problem.

“istio-pilot-0” pod’s log shows:

2022-09-01 09:37:24 ERROR juju.worker.uniter.operation runhook.go:140 hook "install" (via unknown/invalid hook handler) failed: charm missing from disk
2022-09-01 09:37:24 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "install" hook

/var/lib/juju/agents/unit-istio-pilot-0:

# ls -la /var/lib/juju/agents/unit-istio-pilot-0
total 4
drwxr-xr-x. 2 root root   42 Sep  5 02:09 .
drwxr-xr-x. 3 root root   32 Sep  5 02:09 ..
-rw-------. 1 root root 2042 Sep  5 02:09 agent.conf
srwx------. 1 root root    0 Sep  5 02:09 run.socket

I’m new to juju and charm. If anyone could help, I would appreciate.

Sorry for the wrong “juju status” screenshot, I have deleted it.

And I tried to re-deploy the istio-pilot application with:

juju remove-application istio-pilot --force
juju deploy istio-pilot --channel 1.11/beta

After re-deploy, /var/lib/juju/agents/unit-istio-pilot-0/charm appeared, But disappear (again) once I restart OKD.

Hello @hychiu, this is an error I’ve never seen before. Is this only happening with istio-pilot?

There might be some technical difference in OKD that might be causing this, we’ve never tested on that platform. When you say you “restart OKD” do you mean you stop the entire cluster and start it again?

Yes, it only happened with “istio-pilot”. And yes, I stop the entire cluster and start it next day.

Many thanks for your reply.

Hello @dominik.f, is there a plan for charmed kubeflow to support Openshift platform?

That is interesting that it only happens on this specific charm would you mind filing a bug in this repository?

While testing and officially supporting Openshift is not our highest priority I don’t see many reasons why our Product shouldn’t work on it. So if there is a straightforward solution to a problem like yours we are more than happy to help.

No problem. And by the way, after granting enough permission to system:serviceaccount:kubeflow:istio-pilot to get through the failed hooks, the charm directory persists after restarting OKD. (But “istio-pilot” hang on waiting for gateway address before and after the restarting.)

I see. Thank you for the information!

That is good news! The issue is probably that the charm didn’t have these permissions when it tried to create the gateway. Maybe removing the charm and deploying it again with the additional permissions granted from start will fix this.