Deploy Charmed Kubeflow to GKE

This guide describes how to deploy Charmed Kubeflow (CKF) to Google Kubernetes Engine (GKE).

You can do so by spinning up a GKE cluster on Google Gloud, and then deploying Kubeflow using the Kubernetes command line tool, kubectl , and Juju.

Requirements

  • Ubuntu 22.04 LTS or later.
  • Google Cloud account.

Content

Prerequisites

  1. Install Google Cloud Command Line Interface (CLI) on your local machine:
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-462.0.1-linux-x86_64.tar.gz
tar -xf google-cloud-cli-462.0.1-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh

Refer to Google Cloud documentation for checking the latest version.

  1. Install the Google Cloud authorisation plugin:
gcloud components install gke-gcloud-auth-plugin
  1. Install kubectl and Juju:
sudo snap install kubectl --classic
sudo snap install juju

Juju 3.4.3 is used for this guide. Take a look at the Supported Kubeflow versions page for more information.

  1. Login and enable services on your Google Cloud project:
gcloud auth login
export PROJECT_ID=test-project
gcloud config set project test-project
gcloud --project=${PROJECT_ID} services enable \
    container.googleapis.com \
    containerregistry.googleapis.com \
    binaryauthorization.googleapis.com

Deploy GKE cluster

  1. You can create a GKE cluster by specifying the machine type and disk size as follows:

CKF suggests at least 4 cores, 32G RAM and 50G of disk for the cluster machines. Therefore, you can use the n1-standard-16 machine type for your cluster. The n1-standard-8 might be enough for testing purposes.

gcloud container clusters create --binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE --zone us-central1-a --machine-type n1-standard-16 --disk-size=100G test-cluster
  1. After your cluster is created, save the credentials to be used with kubectl:
gcloud container clusters get-credentials --zone us-central1-a test-cluster
kubectl config rename-context gke_test-project_us-central1-a_test-cluster gke-cluster
  1. Bootstrap Juju controller to GKE cluster:
/snap/juju/current/bin/juju bootstrap gke-cluster

The command /snap/juju/current/bin/juju is currently used as a workaround for a bug.

Deploy Charmed Kubeflow

  1. You can deploy the CKF bundle with the following command:
juju add-model kubeflow
juju deploy kubeflow --trust  --channel=1.8/stable

CKF 1.8 is used for this guide.

  1. Now wait until units of all applications turn into active state. You can check the current state of the units with juju as follows:
juju status --watch 5s --relations

Refer to Known issues in case you face any issues. Note that oidc-gatekeeper stays in blocked state until next steps are carried out.

  1. Make the CKF dashboard accessible by configuring its public URL to be the same as the LoadBalancer’s DNS record:
PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
echo PUBLIC_URL: $PUBLIC_URL
juju config dex-auth public-url=$PUBLIC_URL 
juju config oidc-gatekeeper public-url=$PUBLIC_URL
  1. Configure Dex-auth credentials:
juju config dex-auth static-username=user@example.com 
juju config dex-auth static-password=user

You can choose a different (more secure!) password.

  1. Access the CKF dashboard by navigating to the PUBLIC_URL returned as the output in step 3. You should first see the Dex login screen. Once logged in with the credentials set above, you should now see the CKF welcome page.

Known issues

Oidc-gatekeeper “Waiting for pod startup to complete”

If the Juju status shows the oidc-gatekeeper/0 unit as follows:

oidc-gatekeeper/0*         waiting      idle   10.1.121.241                 Waiting for pod startup to complete.

You have to reconfigure the public-url configuration for the charm:

PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
juju config oidc-gatekeeper public-url=""
juju config oidc-gatekeeper public-url=$PUBLIC_URL

Clean up resources

You can clean up allocated resources in Juju and Google Cloud as follows:

juju destroy-model kubeflow --destroy-storage
gcloud container clusters delete test-cluster --zone us-central1-a