Upgrading Charmed Kubeflow (CKF) from 1.8 to 1.9 requires upgrading each charm individually. New relations must be added separately. Most charms can be upgraded simply with juju refresh
, however certain components require additional steps to upgrade.
CKF 1.9 is incompatible with Charmed MLflow 2.1. If you have Charmed MLflow deployed, you should avoid upgrading to 1.9, until a newer version of Charmed MLflow is released.
Before the upgrade
Before upgrading CKF, you should do the following:
- Make sure:
- All pipeline runs are completed and there are no recurring runs enabled.
- Katib experiments, training jobs and notebooks are not in progress or pending.
- Back up any important data according to your organisation’s policies. For databases, MinIO bucket pipelines and ML metadata, refer to the backup guide for further details. For restoring that data, refer to the restore guide.
The backup guide above does not guarantee the backup of all Kubeflow resources, such as notebooks and profiles. Make sure to take the appropriate actions to avoid accidental data loss.
- Record all charm versions, including revisions, in your existing CKF deployment. This can be done by running
juju export-bundle
.
Upgrade environment
Juju
As with the 1.8 latest update, Charmed Kubeflow 1.9 is supported on Juju 3.4 (>= 3.4.3). Make sure to use a compatible version. If needed, follow the instructions in order to upgrade the deployment.
Kubernetes
Due to Istio, CKF requires a Kubernetes cluster >=1.27 (see Supported versions). Before upgrading to CKF 1.9, make sure this requirement is met.
Upgrade charms
To upgrade charms, you should follow the steps below in the proposed order.
Some charms may go to Blocked
state during some steps of the upgrade process. Once the upgrade is completed, all charms should be green and in Active
state.
Istio
Istio needs to be upgraded to version 1.22, assuming the deployed istio-pilot
and istio-ingressgateway
versions are 1.17.
- Scale down the
istio-ingressgateway
application to 0:
juju scale-application istio-ingressgateway 0
- To make sure the
istio-ingressgateway
deployment is removed, run the following command. It should succeed by returning 0:
kubectl -n kubeflow get deploy istio-ingressgateway-workload 2> >(grep -q "NotFound" && echo $?)
- Upgrade
istio-pilot
charm to all intermediate versions. Thus, run each of the following commands separately and wait until it goes toActive
state before running the next one:
juju refresh istio-pilot --channel 1.18/stable
juju refresh istio-pilot --channel 1.19/stable
juju refresh istio-pilot --channel 1.20/stable
juju refresh istio-pilot --channel 1.21/stable
juju refresh istio-pilot --channel 1.22/stable
- Upgrade and scale up
istio-ingressgateway
charm:
juju refresh istio-ingressgateway --channel 1.22/stable
juju scale-application istio-ingressgateway 1
If you encounter any issues during the upgrade, refer to Istio upgrade troubleshooting for more details.
PodSpec to Sidecar charms
Some charms were rewritten from PodSpec to Sidecar between CKF 1.8 and 1.9.
Mlmd
-
Backup ML metadata following the instructions from this guide for MLMD <= 1.14 and CKF 1.8.
-
Remove the relation with
requirer
charms (envoy
andkfp-metadata-writer
):
juju remove-relation envoy mlmd
juju remove-relation kfp-metadata-writer mlmd
Note that grpc
relations are restored once the requirer
charms are upgraded. You’ll do that in the “Add grpc relations” step of Charms with refresh section.
- Remove the
mlmd
application:
This wipes out the storage attached to the mlmd
charm, that is, the database handled by this charm. Make sure you have performed the backup from step 1.
You must wait for the application to disappear (takes less than a minute).
juju remove-application mlmd --destroy-storage
- Deploy
mlmd
from 1.9 corresponding channel:
juju deploy mlmd --channel ckf-1.9/stable --trust
- Restore ML metadata following instructions for MLMD > 1.14 and CKF 1.9.
Rest of PodSpec charms
Juju 3.4 requires to scale down the application, refresh it, and then scale it up.
If CKF is deployed on AKS, skip this section and follow instead the Rest of PodSpec charms on AKS section.
- Scale down applications:
You must wait for the units to disappear (takes less than a minute).
juju scale-application katib-controller 0
juju scale-application kubeflow-volumes 0
juju scale-application envoy 0
- Refresh to the new charms:
juju refresh katib-controller --channel 0.17/stable --trust
juju refresh kubeflow-volumes --channel 1.9/stable --trust
juju refresh envoy --channel 2.2/stable --trust
- Scale up the applications:
juju scale-application katib-controller 1
juju scale-application kubeflow-volumes 1
juju scale-application envoy 1
Rest of PodSpec charms on AKS
Due to this bug, the standard PodSpec charms upgrade path with juju refresh
on AKS ends up with them being stuck in Unknown
status, unable to spin up a new refreshed unit. Instead, you can apply the following workaround:
- The commands below prevent the loss of your workloads created by Katib:
kubectl annotate crd experiments.kubeflow.org controller.juju.is/id-
kubectl annotate crd experiments.kubeflow.org model.juju.is/id-
kubectl label crd experiments.kubeflow.org app.juju.is/created-by-
kubectl label crd experiments.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd experiments.kubeflow.org app.kubernetes.io/name-
kubectl label crd experiments.kubeflow.org model.juju.is/name-
kubectl annotate crd trials.kubeflow.org controller.juju.is/id-
kubectl annotate crd trials.kubeflow.org model.juju.is/id-
kubectl label crd trials.kubeflow.org app.juju.is/created-by-
kubectl label crd trials.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd trials.kubeflow.org app.kubernetes.io/name-
kubectl label crd trials.kubeflow.org model.juju.is/name-
kubectl annotate crd suggestions.kubeflow.org controller.juju.is/id-
kubectl annotate crd suggestions.kubeflow.org model.juju.is/id-
kubectl label crd suggestions.kubeflow.org app.juju.is/created-by-
kubectl label crd suggestions.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd suggestions.kubeflow.org app.kubernetes.io/name-
kubectl label crd suggestions.kubeflow.org model.juju.is/name-
- Remove PodSpec charms:
juju remove-application katib-controller
juju remove-application kubeflow-volumes
juju remove-application envoy
- Wait until the applications have been removed. To make sure all related resources are removed, run the following commands. They should succeed by returning 0:
juju show-application katib-controller 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy katib-controller 2> >(grep -q "NotFound" && echo $?)
juju show-application kubeflow-volumes 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy kubeflow-volumes 2> >(grep -q "NotFound" && echo $?)
juju show-application envoy 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy envoy 2> >(grep -q "NotFound" && echo $?)
- Deploy the new charms and add the relations:
juju deploy envoy --channel 2.2/stable --trust
juju deploy kubeflow-volumes --channel 1.9/stable --trust
juju deploy katib-controller --channel 0.17/stable --trust
juju integrate kubeflow-dashboard:links kubeflow-volumes:dashboard-links
juju integrate istio-pilot:ingress kubeflow-volumes:ingress
juju integrate istio-pilot:ingress envoy:ingress
Charms with refresh
- Upgrade the rest of the charms with
juju refresh
:
juju refresh admission-webhook --channel 1.9/stable
juju refresh argo-controller --channel 3.4/stable
juju refresh dex-auth --channel 2.39/stable
juju refresh jupyter-controller --channel 1.9/stable
juju refresh jupyter-ui --channel 1.9/stable
juju refresh katib-db-manager --channel 0.17/stable
juju refresh katib-ui --channel 0.17/stable
juju refresh kfp-api --channel 2.2/stable
juju refresh kfp-metadata-writer --channel 2.2/stable
juju refresh kfp-persistence --channel 2.2/stable
juju refresh kfp-profile-controller --channel 2.2/stable
juju refresh kfp-schedwf --channel 2.2/stable
juju refresh kfp-ui --channel 2.2/stable
juju refresh kfp-viewer --channel 2.2/stable
juju refresh kfp-viz --channel 2.2/stable
juju refresh knative-eventing --channel 1.12/stable
juju refresh knative-operator --channel 1.12/stable
juju refresh knative-serving --channel 1.12/stable
juju refresh kserve-controller --channel 0.13/stable
juju refresh kubeflow-dashboard --channel 1.9/stable
juju refresh kubeflow-profiles --channel 1.9/stable
juju refresh kubeflow-roles --channel 1.9/stable
juju refresh metacontroller-operator --channel 3.0/stable
juju refresh minio --channel ckf-1.9/stable
juju refresh oidc-gatekeeper --channel ckf-1.9/stable
juju refresh pvcviewer-operator --channel 1.9/stable
juju refresh tensorboard-controller --channel 1.9/stable
juju refresh tensorboards-web-app --channel 1.9/stable
juju refresh training-operator --channel 1.8/stable
- Add
grpc
relations tomlmd
:
juju integrate envoy:grpc mlmd:grpc
juju integrate kfp-metadata-writer:grpc mlmd:grpc
- Add new relations:
juju integrate katib-db-manager:k8s-service-info katib-controller:k8s-service-info
juju integrate kubeflow-dashboard:links training-operator:dashboard-links
juju integrate oidc-gatekeeper:dex-oidc-config dex-auth:dex-oidc-config