Deploy Charmed Kubeflow to EKS

Welcome to the Deploy Charmed Kubeflow to EKS guide. This how-to guide will take you through the steps of deploying Kubeflow to an AWS Elastic Kubernetes Service (EKS) cluster. From an architectural point of view, we will spin up an EKS cluster on AWS cloud using eksctl on our local machine. Then with kubectl and juju still on our local machine, we will interact with the cluster to deploy Kubeflow there.

Requirements:

Content

Deploy EKS cluster

See here for a complete guide on how to do exactly that.

Set up Juju

Set up juju on your local machine to access the remote Kubernetes cloud.

  1. Install juju.
sudo snap install juju --channel=3.4/stable
  1. Add your EKS cluster as a cloud to Juju (kubeflow cloud name is optional).
/snap/juju/current/bin/juju add-k8s kubeflow --client

:warning: The command /snap/juju/current/bin/juju is currently used as a workaround for a bug.

  1. Bootstrap a Juju controller (kubeflow-controller controller’s name is optional).
juju bootstrap kubeflow kubeflow-controller
  1. Add a Juju model (kubeflow name here is mandatory).
juju add-model kubeflow
  1. Verify that namespace kubeflow exists
kubectl get ns

Deploy Kubeflow bundle

  1. Deploy Charmed Kubeflow bundle with the following command.
juju deploy kubeflow --channel=1.8/stable --trust
  1. Wait until all charms are in green/active state. You can check the state of the charms with the following command. In case you face any issues, refer to the Known issues section below. Keep in mind that oidc-gatekeeper will go to Blocked status until we configure it as shown in next steps.
juju status --watch 5s --relations
  1. Make Kubeflow dashboard accessible by configuring its public URL to be the same as the LoadBalancer’s DNS record.
PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
echo PUBLIC_URL: $PUBLIC_URL

juju config dex-auth public-url=$PUBLIC_URL
juju config oidc-gatekeeper public-url=$PUBLIC_URL
  1. Configure Dex-auth credentials. Feel free to use a different (more secure!) password if you wish.
juju config dex-auth static-username=user@example.com 
juju config dex-auth static-password=user
  1. Navigate to the PUBLIC_URL printed above to access Kubeflow dashboard. You should first see the Dex login screen. Once logged in with the credentials set above, you should now see the Kubeflow “Welcome” page.

Note that accessing the dashboard is based on the fact that when a kubernetes service of type LoadBalancer is created, an AWS Classic Load Balancer (CLB) is provisioned that load balances application traffic.

Known issues


Oidc-gatekeeper “Waiting for pod startup to complete”

If you see the oidc-gatekeeper/0 unit in juju status output in waiting state with

oidc-gatekeeper/0*         waiting      idle   10.1.121.241                 Waiting for pod startup to complete.

You can reconfigure the public-url configuration for the charm with following commands

PUBLIC_URL="http://$(kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
juju config oidc-gatekeeper public-url=""
juju config oidc-gatekeeper public-url=$PUBLIC_URL

Clean up resources

For EKS clean up, refer to the guide mentioned here. In order to clean up juju, run the following:

juju unregister kubeflow-controller
juju remove-cloud kubeflow --client

Hi, two small suggestions to the guide:

Install juju. Normally, you would install juju using snap. For the use with EKS though, install Juju from binary using the latest 3.4.x package for your machine. This is because Juju 3.4 cannot add public clouds when installed from snap (juju bug) due to strict confinement of the snap.

Instead of this can we just suggest using /snap/juju/current/bin/juju as we do it in GKE guide? I have checked and it works in this case for add-k8s as well. I believe it is easier for user.

juju config dex-auth public-url=$PUBLIC_URL
juju config oidc-gatekeeper public-url=$PUBLIC_URL

Oidc-gatekeeper part is not needed in recent versions due to this fix. User will see
ERROR parsing settings for application: unknown option "public-url".

Maybe we can comment that it might not be needed.

  1. Updated, thank you for the tip. This workaround is indeed more clean.
  2. What you describe is accurate but for Kubeflow 1.9. This feature isn’t present in 1.8, for which the guide is right now. We 'd need to update both this and the AKS one. I 'll raise an issue about this, thanks for catching it.