This guide describes how to deploy Charmed Kubeflow (CKF) to Google Kubernetes Engine (GKE).
You can do so by spinning up a GKE cluster on Google Gloud, and then deploying Kubeflow using the Kubernetes command line tool, kubectl
, and Juju.
Requirements
- Ubuntu 22.04 LTS or later.
- Google Cloud account.
Content
Prerequisites
- Install Google Cloud Command Line Interface (CLI) on your local machine:
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-462.0.1-linux-x86_64.tar.gz
tar -xf google-cloud-cli-462.0.1-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh
Refer to Google Cloud documentation for checking the latest version.
- Install the Google Cloud authorisation plugin:
gcloud components install gke-gcloud-auth-plugin
- Install kubectl and Juju:
sudo snap install kubectl --classic
sudo snap install juju
Juju 3.4.3 is used for this guide. Take a look at the Supported Kubeflow versions page for more information.
- Login and enable services on your Google Cloud project:
gcloud auth login
export PROJECT_ID=test-project
gcloud config set project test-project
gcloud --project=${PROJECT_ID} services enable \
container.googleapis.com \
containerregistry.googleapis.com \
binaryauthorization.googleapis.com
Deploy GKE cluster
- You can create a GKE cluster by specifying the machine type and disk size as follows:
CKF suggests at least 4 cores, 32G RAM and 50G of disk for the cluster machines. Therefore, you can use the n1-standard-16 machine type for your cluster. The n1-standard-8 might be enough for testing purposes.
gcloud container clusters create --binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE --zone us-central1-a --machine-type n1-standard-16 --disk-size=100G test-cluster
- After your cluster is created, save the credentials to be used with kubectl:
gcloud container clusters get-credentials --zone us-central1-a test-cluster
kubectl config rename-context gke_test-project_us-central1-a_test-cluster gke-cluster
- Bootstrap Juju controller to GKE cluster:
/snap/juju/current/bin/juju bootstrap gke-cluster
The command /snap/juju/current/bin/juju
is currently used as a workaround for a bug.
Deploy Charmed Kubeflow
- You can deploy the CKF bundle with the following command:
juju add-model kubeflow
juju deploy kubeflow --trust --channel=1.8/stable
CKF 1.8 is used for this guide.
- Now wait until units of all applications turn into active state. You can check the current state of the units with juju as follows:
juju status --watch 5s --relations
Refer to Known issues in case you face any issues. Note that oidc-gatekeeper
stays in blocked state until next steps are carried out.
- Make the CKF dashboard accessible by configuring its public URL to be the same as the LoadBalancer’s DNS record:
PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
echo PUBLIC_URL: $PUBLIC_URL
juju config dex-auth public-url=$PUBLIC_URL
juju config oidc-gatekeeper public-url=$PUBLIC_URL
- Configure Dex-auth credentials:
juju config dex-auth static-username=user@example.com
juju config dex-auth static-password=user
You can choose a different (more secure!) password.
- Access the CKF dashboard by navigating to the
PUBLIC_URL
returned as the output in step 3. You should first see the Dex login screen. Once logged in with the credentials set above, you should now see the CKF welcome page.
Known issues
Oidc-gatekeeper “Waiting for pod startup to complete”
If the Juju status shows the oidc-gatekeeper/0
unit as follows:
oidc-gatekeeper/0* waiting idle 10.1.121.241 Waiting for pod startup to complete.
You have to reconfigure the public-url
configuration for the charm:
PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
juju config oidc-gatekeeper public-url=""
juju config oidc-gatekeeper public-url=$PUBLIC_URL
Clean up resources
You can clean up allocated resources in Juju and Google Cloud as follows:
juju destroy-model kubeflow --destroy-storage
gcloud container clusters delete test-cluster --zone us-central1-a