How to upgrade Kubeflow from 1.7 to 1.8

Upgrading Charmed Kubeflow from 1.7 to 1.8 requires upgrading each charm individually. New charms and relations must be deployed separately. Most charms can be upgraded simply with juju refresh; however certain components require additional steps to upgrade.

Requirements

  • An active and idle Charmed Kubeflow 1.7 deployment. This requires all charms in the bundle to be in that state. Access to dashboard of the existing Charmed Kubeflow 1.7 deployment.
  • Admin access to Kubernetes cluster where existing Charmed Kubeflow 1.7 is deployed.
  • Tools: kubectl, juju (version 3.x)

Contents

Before Charmed Kubeflow upgrade

Before upgrading Charmed Kubeflow it is recommended to do the following:

  • Stop all Notebooks.
  • Make sure all Workflows are completed and disable Recurring Runs.
  • Review any important data that needs to be backed up and preform backup procedures according to the policies of your organisation.
  • Record all charm versions in existing Charmed Kubeflow deployment.

All upgrade steps should be done in kubeflow model. If you haven’t already, switch to kubeflow model:

# switch to kubeflow model
juju switch kubeflow

Migrate DBs to MySQL

As of Charmed Kubeflow 1.8, MySQL is replacing MariaDB as the database for Katib and Pipelines. Charmed Kubeflow 1.8 does NOT support MariaDB, so you need to migrate to MySQL. For migration instructions, please follow the migration guide.

Migrate Juju 2.9 to 3.4

Charmed Kubeflow 1.8 is supported on Juju 3.4. For deployments that are on Juju 2.9, you should migrate them to Juju 3.4. Follow the guide in the Juju docs:

Juju | Upgrade your Juju deployment from 2.9 to 3.x

Before charms upgrade

Modify CRD labels to keep user workloads

Due to many charms in Charmed Kubeflow 1.8 moving from Podspec to Sidecar pattern, some charms cannot be upgraded with juju refresh. Rather, you need to remove and re-deploy them. The commands listed below are needed to prevent the loss of your workloads created by Notebooks, Argo Workflows, and Scheduled Workflows when removing these charms. Run the following:

# prevent loss of existing notebooks
kubectl annotate crd notebooks.kubeflow.org controller.juju.is/id-
kubectl annotate crd notebooks.kubeflow.org model.juju.is/id-
kubectl label crd notebooks.kubeflow.org app.juju.is/created-by-
kubectl label crd notebooks.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd notebooks.kubeflow.org app.kubernetes.io/name-
kubectl label crd notebooks.kubeflow.org model.juju.is/name-

# prevent loss of defined workflows
kubectl annotate crd workflows.argoproj.io controller.juju.is/id-
kubectl annotate crd workflows.argoproj.io model.juju.is/id-
kubectl label crd workflows.argoproj.io app.juju.is/created-by-
kubectl label crd workflows.argoproj.io app.kubernetes.io/managed-by-
kubectl label crd workflows.argoproj.io app.kubernetes.io/name-
kubectl label crd workflows.argoproj.io model.juju.is/name-

# prevent loss of defined scheduled workflows
kubectl annotate crd scheduledworkflows.kubeflow.org controller.juju.is/id-
kubectl annotate crd scheduledworkflows.kubeflow.org model.juju.is/id-
kubectl label crd scheduledworkflows.kubeflow.org app.juju.is/created-by-
kubectl label crd scheduledworkflows.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd scheduledworkflows.kubeflow.org app.kubernetes.io/name-
kubectl label crd scheduledworkflows.kubeflow.org model.juju.is/name-

Remove argo-server charm

argo-server charm was deprecated in Charmed Kubeflow 1.8. This charm was not being utilized in the bundle, so removing it will not affect your deployment. Rather, you should remove it to save resources. Remove it by running:

juju remove-application argo-server

Upgrade charms

Upgrade Istio

It is assumed that the deployed istio-pilot and istio-ingressgateway versions alongside Charmed Kubeflow 1.7 are 1.16.

  1. Scale down the istio-ingressgateway application to 0
juju scale-application istio-ingressgateway 0
  1. Run the following command to ensure that the istio-ingressgateway deployment is properly removed. If removal is successful, the command should succeed (return 0):
kubectl -n kubeflow get deploy istio-ingressgateway-workload 2> >(grep -q "NotFound" && echo $?)
  1. Upgrade istio-pilot charm.

If the ssl-key and ssl-crt configuration was in place, make sure you read the Migration from configuration to action guide for important considerations.

juju refresh istio-pilot --channel 1.17/stable
  1. Upgrade and Scale up istio-ingressgateway charm
juju refresh istio-ingressgateway --channel 1.17/stable
juju scale-application istio-ingressgateway 1
Troubleshooting of Istio upgrade

Refer to this document for troubleshooting tips.

Re-deploy kubeflow-roles charm

There is a difference how charms are handling Roles and ClusterRoles in 1.8 release. As a result, kubeflow-roles charm needs to be re-deployed rather than refreshed:

# redeploy kubeflow-roles
juju remove-application kubeflow-roles
juju deploy kubeflow-roles --channel 1.8/stable --trust

Upgrade Podspec to Sidecar charms

Some charms were written from PodSpec to Sidecar between Charmed Kubeflow 1.7 to 1.8. Juju 3.4 requires for this kind of upgrade that you scale down the application, refresh it, then scale it up.

  1. Scale down the applications
juju scale-application admission-webhook 0
juju scale-application kfp-profile-controller 0
juju scale-application kfp-ui 0
juju scale-application kfp-viz 0
juju scale-application oidc-gatekeeper 0
juju scale-application tensorboard-controller 0
juju scale-application tensorboards-web-app 0
  1. Refresh to the new charms
juju refresh admission-webhook --channel 1.8/stable --trust
juju refresh kfp-profile-controller --channel 2.0/stable --trust
juju refresh kfp-ui --channel 2.0/stable --trust
juju refresh kfp-viz --channel 2.0/stable --trust
juju refresh oidc-gatekeeper --channel ckf-1.8/stable --trust
juju refresh tensorboard-controller --channel 1.8/stable --trust
juju refresh tensorboards-web-app --channel 1.8/stable --trust
  1. Scale up the applications
juju scale-application admission-webhook 1
juju scale-application kfp-profile-controller 1
juju scale-application kfp-ui 1
juju scale-application kfp-viz 1
juju scale-application oidc-gatekeeper 1
juju scale-application tensorboard-controller 1
juju scale-application tensorboards-web-app 1

Other charms that moved to Sidecar pattern are a special case, they need to be removed and re-deployed, for more information see GH 732. Make sure to follow the pre-upgrade steps before doing this to prevent any loss in your user-created workloads.

  1. Remove the charms from 1.7
juju remove-application jupyter-controller
juju remove-application argo-controller
juju remove-application kfp-persistence
juju remove-application kfp-schedwf
juju remove-application kfp-viewer
  1. Wait for the charms to be removed. Make sure all related resources are properly removed. The following commands should succeed (return 0 ):
juju show-application jupyter-controller 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy jupyter-controller 2> >(grep -q "NotFound" && echo $?)

juju show-application argo-controller 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy argo-controller 2> >(grep -q "NotFound" && echo $?)

juju show-application kfp-persistence 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy kfp-persistence 2> >(grep -q "NotFound" && echo $?)

juju show-application kfp-schedwf 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy kfp-schedwf 2> >(grep -q "NotFound" && echo $?)

juju show-application kfp-viewer 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy kfp-viewer 2> >(grep -q "NotFound" && echo $?)
  1. Deploy the new charms and add the relations
juju deploy jupyter-controller --trust --channel=1.8/stable
juju deploy argo-controller --trust --channel=3.3/stable
juju deploy kfp-persistence --trust --channel=2.0/stable
juju deploy kfp-schedwf --trust --channel=2.0/stable
juju deploy kfp-viewer --trust --channel=2.0/stable

juju relate argo-controller minio

Upgrade charms with refresh

Now, you can upgrade the rest of the CKF charms with juju refresh:

juju refresh dex-auth --channel 2.36/stable 
juju refresh jupyter-ui --channel 1.8/stable
juju refresh katib-controller --channel 0.16/stable
juju refresh katib-db-manager --channel 0.16/stable --trust
juju refresh katib-ui --channel 0.16/stable
juju refresh kfp-api --channel 2.0/stable --trust
juju refresh knative-eventing --channel 1.10/stable
juju refresh knative-operator --channel 1.10/stable
juju refresh knative-serving --channel 1.10/stable
juju refresh kserve-controller --channel 0.11/stable
juju refresh kubeflow-dashboard --channel 1.8/stable
juju refresh kubeflow-profiles --channel 1.8/stable
juju refresh kubeflow-volumes --channel 1.8/stable
juju refresh metacontroller-operator --channel 3.0/stable
juju refresh minio --channel ckf-1.8/stable
juju refresh seldon-controller-manager --channel 1.17/stable
juju refresh training-operator --channel 1.7/stable

juju relate kfp-api:kfp-api kfp-persistence:kfp-api

Add new relations

Add Dashboard relations

Charmed Kubeflow 1.8 introduces dynamic sidebar configuration for the dashboard. Add these relations to kubeflow components to be able to use the new dashboard:

juju relate kubeflow-dashboard:links jupyter-ui:dashboard-links
juju relate kubeflow-dashboard:links katib-ui:dashboard-links
juju relate kubeflow-dashboard:links kfp-ui:dashboard-links
juju relate kubeflow-dashboard:links kubeflow-volumes:dashboard-links
juju relate kubeflow-dashboard:links tensorboards-web-app:dashboard-links

Add KServe-KNative relation

Charmed Kubeflow 1.8 changes the default of KServe deployment mode from RawDeployment to Serverless. For Serverless deployments to work correctly, you need to add the following relation:

juju relate knative-serving:local-gateway kserve-controller

Deploy KFP 2.0 dependencies

Deploy the charms needed for KFP 2.0. These are required in KFP 2.0 for MLMD functionality.

juju deploy envoy --channel=2.0/stable --trust
juju deploy kfp-metadata-writer --channel=2.0/stable --trust
juju deploy mlmd --channel=1.14/stable

juju relate istio-pilot:ingress envoy:ingress
juju relate mlmd:grpc envoy:grpc
juju relate mlmd:grpc kfp-metadata-writer:grpc

Deploy PVCViewer charm

Kubeflow 1.8 introduced the PVCViewer feature in Kubeflow Volumes, this is enabled in Charmed Kubeflow by deploying the pvcviewer-operator charm. Deploy it with:

juju deploy pvcviewer-operator --channel=1.8/stable --trust

Update Kubernetes

Kubeflow 1.8 is supported on Kubernetes versions 1.24, 1.25 and 1.26

Users should update their Kubernetes cluster to one of these versions.

Migrate Pipelines to v2

If you have Pipelines created with SDK v1, you need to migrate them to use SDK v2. This is because KFP SDK v2 is not backwards compatible with SDK v1. Follow the instructions provided by Kubeflow Pipelines documentation:

Kubeflow Pipelines | v2 | Migrate from KFP SDK v1

1 Like

Hi,

As a part of the upgrade, we need to migrate juju from 2.9 to 3.1. But it is very likely to have some units in the model those are not in “active” state and it prevents having a successful migration. Unfortunately, there is also no way to remove the units stuck in other states (like unknown, terminated). Also, in the document which is referred in this procedure, it does not describe how to migrate the model if the controllers are on Kubernetes.

In the discourse post of this document here, it is also not advised having kubernetes controllers as a production. But in the Get Started document here, the insallation made by deploying a Juju controller to the Kubernetes.

So, if it is not a must, I think we may use machine controllers. Or, we need to have some backup, restore, migration, etc. documentations for the Kubernetes controllers.

BR, Ebrar