Upgrading Charmed Kubeflow from 1.6 to 1.7 requires upgrading each charm individually. New features must be deployed separately. Most charms can be upgraded simply with juju refresh
; however certain components require additional steps to upgrade.
Requirements
- An active and idle Charmed Kubeflow 1.6 deployment. This requires all charms in the bundle to be in that state. Access to dashboard of exising Charmed Kubeflow 1.6 deployment.
- Admin access to Kubernetes cluster where existing Charmed Kubeflow 1.6 is deployed.
- Tools:
kubectl
,juju
(version 2.x)
Contents:
- Before upgrade
- Upgrade Istio
- Before charms upgrade
- Upgrade charms
- Deploy KNative and KServe charms
- Verify upgrade
Before Charmed Kubeflow upgrade
Before upgrading Charmed Kubeflow it is recommended to do the following:
- Stop all Notebooks.
- Review any important data that needs to be backed up and preform backup procedures according to the policies of your organisation.
- Record all charm versions in existing Charmed Kubeflow deployment.
All upgrade steps should be done in kubeflow
model. If you haven’t already, switch to kubeflow
model:
# switch to kubeflow model
juju switch kubeflow
Upgrade Istio
Upgrade of istio components is performed according to Istio’s best practices, which requires upgrading Istio by one minor version at a time and in sequence. For more details on upgrading and troubleshooting istio-pilot
and istio-ingressgateway
charms, please refer to this document. It is assumed that the deployed istio-pilot
and istio-ingressgateway
version alongside Charmed Kubeflow 1.6 is 1.11.
- Remove the
istio-ingressgateway
application and corresponding relation withistio-pilot
:
# remove relation and istio-ingressgateway application
juju remove-relation istio-pilot istio-ingressgateway
juju remove-application istio-ingressgateway
- Ensure that
istio-ingressgateway
application and all related resources are properly removed. The following commands should succeed (return0
):
juju show-application istio-ingressgateway 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy istio-ingressgateway-workload 2> >(grep -q "NotFound" && echo $?)
Troubleshooting of removal of `istio-ingressgateway` application
WARNING: Removing application using --force
option should be the last resort. There could be potential stability issues if application is not shutdown cleanly.
If required, remove istio-ingressgateway
application with --force
option and remove istio-ingressgateway-workload
manually:
juju remove-application --force istio-ingressgateway
kubectl -n kubeflow delete deploy istio-ingressgateway-workload
- Upgrade
istio-pilot
charm in sequence. For intermediate versions, Wait for eachrefresh
command to finish and upgrade is complete, i.e.istio-pilot
is inwaiting
status with the message"Missing istio-ingressgateway-workload service, deferring this event"
.
# upgrade istio-pilot from 1.11 to 1.12
juju refresh istio-pilot --channel 1.12/stable
Initial upgrade from 1.11 to 1.12 might take some time. Ensure that istio-pilot
charm has completed its upgrade.
# upgrade istio-pilot from 1.12 to 1.13
juju refresh istio-pilot --channel 1.13/stable
# upgrade istio-pilot from 1.13 to 1.14
juju refresh istio-pilot --channel 1.14/stable
# upgrade istio-pilot from 1.14 to 1.15
juju refresh istio-pilot --channel 1.15/stable
# upgrade istio-pilot from 1.15 to 1.16
juju refresh istio-pilot --channel 1.16/stable
After refreshing to 1.16
, istio-pilot
should reach active
status within a few minutes. Otherwise, check out the troubleshooting tips below.
Troubleshooting of Istio upgrade
Refer to this document for troubleshooting tips.
- Deploy
istio-ingressgateway
add relation betweenistio-pilot
andistio-gateway
:
# deploy istio-ingressgateway
juju deploy istio-gateway --channel 1.16/stable --trust --config kind=ingress istio-ingressgateway
juju relate istio-pilot istio-ingressgateway
Before charms upgrade
Before charms can be upgraded the following actions need to be taken:
- Eanble trust on deployed charms (required).
- Updated default
admin
profile to prevent its deletion (optional)
Enable trust on deployed charms
Because of changes in the charm code, some charms in Charmed Kubeflow 1.6 have to be trusted by juju before the upgrade.
WARNING: Please note that if you do not execute juju trust
for these charms, you may encounter authorization errors. If that is the case, please refer to the Troubleshooting guide.
# enable trust on charms
juju trust jupyter-ui --scope=cluster
juju trust katib-db-manager --scope=cluster
juju trust katib-ui --scope=cluster
juju trust kfp-api --scope=cluster
juju trust kubeflow-dashboard --scope=cluster
juju trust kubeflow-profiles --scope=cluster
juju trust seldon-controller-manager --scope=cluster
Update default admin
profile to prevent its deletion
In Charmed Kubeflow 1.6 a user profile named admin
is created by default at deployment time. This profile has no additional priviledges - it is just a default profile that was created for convenience and has been removed as of Charmed Kubeflow 1.7. When upgrading to 1.7 this default profile will be deleted. If you depend on this profile, you can do the following to prevent its deletion:
# update admin profile
kubectl annotate profile admin controller.juju.is/id-
kubectl annotate profile admin model.juju.is/id-
kubectl label profile admin app.juju.is/created-by-
kubectl label profile admin app.kubernetes.io/managed-by-
kubectl label profile admin app.kubernetes.io/name-
kubectl label profile admin model.juju.is/name-
Re-deploy kubeflow-roles
charm
There is a difference how charms are handling Roles and ClusterRoles in 1.7 release. As a result, kubeflow-roles
charm needs to be re-deployed rather than refreshed:
# redeploy kubeflow-roles
juju remove-application kubeflow-roles
juju deploy kubeflow-roles --channel 1.7/stable --trust
Upgrade charms
To upgrade Charmed Kubeflow each charm needs to be refreshed. It is recommended to wait for each charm to finish its upgrade before proceeding with the next.
Depending on original deployment of Charmed Kuberflow version 1.6, refresh command will report that charm is up-to-date which indicates that there is not need to upgrade that particular charm.
During the upgrade some charms can temporarily go into error
or blocked
state, but they should go active
after a while.
# upgrade charms
juju refresh admission-webhook --channel 1.7/stable
juju refresh argo-controller --channel 3.3/stable
juju refresh argo-server --channel 3.3/stable
juju refresh dex-auth --channel 2.31/stable
juju refresh jupyter-controller --channel 1.7/stable
juju refresh jupyter-ui --channel 1.7/stable
juju refresh katib-controller --channel 0.15/stable
juju refresh katib-db --channel latest/stable
juju refresh katib-db-manager --channel 0.15/stable
juju refresh katib-ui --channel 0.15/stable
juju refresh kfp-api --channel 2.0/stable
juju refresh kfp-db --channel latest/stable
juju refresh kfp-persistence --channel 2.0/stable
juju refresh kfp-profile-controller --channel 2.0/stable
juju refresh kfp-schedwf --channel 2.0/stable
juju refresh kfp-ui --channel 2.0/stable
juju refresh kfp-viewer --channel 2.0/stable
juju refresh kfp-viz --channel 2.0/stable
juju refresh kubeflow-dashboard --channel 1.7/stable
juju refresh kubeflow-profiles --channel 1.7/stable
juju refresh kubeflow-volumes --channel 1.7/stable
juju refresh metacontroller-operator --channel 2.0/stable
juju refresh minio --channel ckf-1.7/stable
juju refresh oidc-gatekeeper --channel ckf-1.7/stable
juju refresh seldon-controller-manager --channel 1.15/stable
juju refresh tensorboard-controller --channel 1.7/stable
juju refresh tensorboards-web-app --channel 1.7/stable
juju refresh training-operator --channel 1.6/stable
Troubleshooting charm upgrade
If charm fails upgrade or is stuck in maintenance
state for long time it is possible to recover by running refresh command with version that was there prior to deployment, i.e. downgrade the charm. After that repeat the upgrade.
Deploy KNative and KServe charms
KNative and KServe are new additions to Charmed Kubeflow 1.7 and need to be deployed separately as part of the upgrade:
# install knative and kserve
juju deploy knative-operator --channel 1.8/stable --trust
juju deploy knative-serving --config namespace="knative-serving" --config istio.gateway.namespace=kubeflow --config istio.gateway.name=kubeflow-gateway --channel 1.8/stable --trust
juju deploy knative-eventing --config namespace="knative-eventing" --channel 1.8/stable --trust
juju deploy kserve-controller --channel 0.10/stable --trust
juju relate istio-pilot:gateway-info kserve-controller:ingress-gateway
Verify upgrade
You can verify the progress of the upgrade by running:
watch -c juju status --color
When all services are in active/idle state then the upgrade should be finished.