How to upgrade Kubeflow from 1.7 to 1.7 Patch 1

Charmed Kubeflow (CKF) 1.7 Patch 1 is a patch release that includes bug fixes and feature enhancements, documented in the release notes. This guide shows how to upgrade from CKF 1.7 to 1.7 Patch 1.

Requirements

  • An active and idle Charmed Kubeflow 1.7 deployment. This requires all charms in the bundle to be in that state.
  • Access to the dashboard of the existing Charmed Kubeflow 1.7 deployment.
  • Admin access to a Kubernetes cluster where the existing Charmed Kubeflow 1.7 is deployed.
  • Tools: kubectl, juju (version 2.x)

Contents:

Do you need an upgrade?

Check on the charm revisions deployed in your cluster to determine if they’re outdated and are eligible for the upgrade. To find the current revisions of your charms, run:

juju status 

The Patch 1 release touched only a subset of charms in the Charmed Kubeflow bundle. Compare the revisions of your deployed charms to the revisions listed below. If your deployed charms have a smaller revision, then proceed with the upgrade.

charm revision
admission-webhook 205
istio-pilot 551
istio-gateway 551
katib-controller 282
katib-db-manager 253
katib-ui 267
kfp-api 413
kfp-persistence 407
kfp-profile-controller 388
kfp-schedwf 424
kfp-ui 412
kfp-viewer 424
kfp-viz 394
knative-eventing 224
knative-operator 199
knative-serving 224
kserve-controller 267
seldon-controller-manager 354

Before Charmed Kubeflow upgrade

Before upgrading Charmed Kubeflow it is recommended to do the following:

  • Stop all Notebooks.
  • Review any important data that needs to be backed up and perform backup procedures according to the policies of your organisation.
  • Record all charm versions in existing Charmed Kubeflow deployment.

All upgrade steps should be performed in the kubeflow model. If you haven’t already, switch to the kubeflow model:

# switch to kubeflow model
juju switch kubeflow

Upgrade Istio

It is assumed that the deployed istio-pilot and istio-ingressgateway versions alongside Charmed Kubeflow 1.7 are 1.16.

  1. Remove the istio-ingressgateway application and the corresponding relation with istio-pilot:
# remove relation and istio-ingressgateway application
juju remove-relation istio-pilot istio-ingressgateway
juju remove-application istio-ingressgateway
  1. Run the following command to ensure that the istio-ingressgateway application and all related resources are properly removed. If removal was successful, the command should succeed (return 0):
juju show-application istio-ingressgateway 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy istio-ingressgateway-workload 2> >(grep -q "NotFound" && echo $?)
Troubleshooting of removal of `istio-ingressgateway` application

WARNING: Removing application using --force option should be the last resort. There could be potential stability issues if application is not shutdown cleanly.

If required, remove istio-ingressgateway application with --force option and remove istio-ingressgateway-workload manually:

    juju remove-application --force istio-ingressgateway
    kubectl -n kubeflow delete deploy istio-ingressgateway-workload
  1. Upgrade istio-pilot charm.
juju refresh istio-pilot --channel 1.16/stable
Troubleshooting of Istio upgrade

Refer to this document for troubleshooting tips.

  1. Deploy istio-ingressgateway and add a relation between istio-pilot and istio-gateway:
# deploy istio-ingressgateway
juju deploy istio-gateway --channel 1.16/stable --trust --config kind=ingress istio-ingressgateway
juju relate istio-pilot istio-ingressgateway

Before charms upgrade

Before charms can be upgraded, you need to enable trust on deployed charms (required).

Enable trust on deployed charms

Because of changes in the charm code, some charms in Charmed Kubeflow 1.7 have to be trusted by juju before the upgrade.

WARNING: Please note that if you do not execute juju trust for these charms, you may encounter authorization errors. If that is the case, please refer to the Troubleshooting guide.

# enable trust on charms
juju trust katib-db-manager --scope=cluster
juju trust kfp-api --scope=cluster

Upgrade charms

To upgrade Charmed Kubeflow each charm needs to be refreshed. It is recommended to wait for each charm to finish its upgrade before proceeding with the next.

Depending on the original deployment of Charmed Kuberflow version 1.7, the refresh command will report that charm is up-to-date which indicates that there is no need to upgrade that particular charm.

Before running the refresh command these are the applications which you should scale down before proceeding.

juju scale-application admission-webhook 0
juju scale-application kfp-profile-controller 0
juju scale-application kfp-ui 0
juju scale-application kfp-viz 0

Lets proceed with the refresh commands. During the upgrade some charms can temporarily go into error or blocked state, but they should go active after a while.

# upgrade charms
juju refresh admission-webhook --channel 1.7/stable
juju refresh katib-controller --channel 0.15/stable
juju refresh katib-db-manager --channel 0.15/stable
juju refresh katib-ui --channel 0.15/stable
juju refresh kfp-api --channel 2.0-alpha.7/stable
juju refresh kfp-persistence --channel 2.0-alpha.7/stable
juju refresh kfp-profile-controller --channel 2.0-alpha.7/stable
juju refresh kfp-schedwf --channel 2.0-alpha.7/stable
juju refresh kfp-ui --channel 2.0-alpha.7/stable
juju refresh kfp-viewer --channel 2.0-alpha.7/stable
juju refresh kfp-viz --channel 2.0-alpha.7/stable
juju refresh knative-eventing --channel 1.8/stable
juju refresh knative-operator --channel 1.8/stable
juju refresh knative-serving --channel 1.8/stable
juju refresh kserve-controller --channel 0.10/stable
juju refresh seldon-controller-manager --channel 1.15/stable
juju relate istio-pilot:gateway-info kserve-controller:ingress-gateway
Troubleshooting charm upgrade

If a charm fails to upgrade or is stuck in a maintenance state for a long time it is possible to recover by running the refresh command with the version that was there prior to deployment, i.e. downgrade the charm. After that, repeat the upgrade.

Don’t forget to scale up.

juju scale-application admission-webhook 1
juju scale-application kfp-profile-controller 1
juju scale-application kfp-ui 1
juju scale-application kfp-viz 1

Verify upgrade

You can verify the progress of the upgrade by running

watch -c juju status --color

when all services are in active/idle state then the upgrade should be finished.

1 Like

Is there any tutorials about kserve deploy modes?

Like change default RawDeployment to Serverless.

Thanks for u help.

Hi PoKaiChang, have you tried this?

juju config kserve-controller deployment-mode=Serverless

It works. Thanks for your help. BTW, when will kserver support model mesh?


update (need another cmds)

juju relate istio-pilot:gateway-info kserve-controller:ingress-gateway

juju relate kserve-controller:local-gateway knative-serving:local-gateway

Hey, good question. I’ve asked about this on our Mattermost channel. Come join the chat HERE.

I have face some issues that I can’t use ISVC url to predict. (cluster ip it works)

Is there any configs that I need to set it correctly?

Thanks for your help.

Hi,

What have you tried so far for debugging / checking logs / statuses? Any specific error messages you’ve come across?

You could use juju status to check the relations between istio-pilot, kserve-controller, and knative-serving. Or check the InferenceService’s status using kubectl, for example.

Finally I figured it out.

Please help to check the steps.

  1. juju config knative-serving domain.name=your.domain.name

  2. juju config kserve-controller domain-name=your.domain.name

  3. kubectl delete cm -n kubeflow inferenceservice-config

  4. wait for juju to recreate inferenceservice-config

  5. edit inferenceservice-config.localGateway kubeflow/kantive-local-gateway -> knative-serving/kantive-local-gateway

  6. finaly kubectl edit cm -n knative-serving config-domain (patch your.domain.name on it)

Hey. Glad to hear you got it working. Are you still facing an issue? Do you think this is something worth reporting as a bug so that others can benefit?

For some reason I didn’t get a notification here - so in case I don’t see your reply I’d recommend continuing the chat on Mattermost.