Install on EKS

This guide describes how to install Charmed Kubeflow (CKF) on AWS Elastic Kubernetes Service (EKS).

You will spin up an EKS cluster on AWS cloud using the Amazon EKS Command Line Interface (CLI), eksctl, on your local machine. Then, you will interact with the cluster and deploy CKF using kubectl and Juju.

Requirements

If you use IAM credentials for eksctl authentication, make sure they meet these minimum IAM policies.

Deploy EKS cluster

First, clone the following repository containing the YAML file used to create the EKS cluster:

git clone https://github.com/canonical/kubeflow-examples.git
cd kubeflow-examples/eks-cluster-setup

Configure the deployment through the YAML file. The configuration set in the YAML file above provides the minimum requirements for deploying CKF:

  • region: the cluster is deployed by default to eu-central-1 zone. Edit metadata.region and availabilityZones according to your needs.
  • ssh key: edit managedNodeGroups[0].ssh.publicKeyName with your key pair name to enable SSH access into the new EC2 instances.
  • instance type: the cluster is deployed with EC2 instances of type t2.2xlarge for worker nodes, according to the managedNodeGroups[0].instanceType field. See Instance types for more information.
  • k8s version: the cluster uses Kubernetes (K8s) 1.24 by default. See Supported versions for more details about compatibility between CKF, K8s and Juju.
  • worker nodes: the cluster has two worker nodes. Edit maxSize and minSize under managedNodeGroups[0] according to your needs.
  • volume size: each worker node has gp2/gp3 disk of size 100 Gb. Edit managedNodeGroups[0].volumeSize for a different configuration.

You can now deploy the cluster as follows:

eksctl create cluster -f cluster.yaml

Deployment will take some time, up to 20 minutes.

Note that the deployment incurs charges for every hour the cluster is running.

Verify access to the cluster

Check the access to the cluster as follows:

kubectl get nodes

See Creating a kubeconfig file in case the command above does not return the expected node ouptut.

Set up Juju

  1. Install Juju:
sudo snap install juju --channel=3.4/stable
  1. Add your EKS cluster as a cloud to Juju:
/snap/juju/current/bin/juju add-k8s eks --client
  1. Bootstrap a Juju controller:
/snap/juju/current/bin/juju bootstrap eks eks-controller

The command /snap/juju/current/bin/juju is currently used as a workaround for a bug.

See Get started with Juju for more details.

Deploy CKF

To deploy CKF and access its dashboard, follow the steps provided in the general installation guide from creating the kubeflow model section.

Clean up resources

See Delete a cluster for information about removing the EKS cluster and related resources.

The procedure above does not always delete the volumes that have been created during the cluster deployment. In that case, you can delete them manually.

To clean up Juju resources, run the following commands:

juju unregister eks-controller
juju remove-cloud eks --client

Hi, two small suggestions to the guide:

Install juju. Normally, you would install juju using snap. For the use with EKS though, install Juju from binary using the latest 3.4.x package for your machine. This is because Juju 3.4 cannot add public clouds when installed from snap (juju bug) due to strict confinement of the snap.

Instead of this can we just suggest using /snap/juju/current/bin/juju as we do it in GKE guide? I have checked and it works in this case for add-k8s as well. I believe it is easier for user.

juju config dex-auth public-url=$PUBLIC_URL
juju config oidc-gatekeeper public-url=$PUBLIC_URL

Oidc-gatekeeper part is not needed in recent versions due to this fix. User will see
ERROR parsing settings for application: unknown option "public-url".

Maybe we can comment that it might not be needed.

  1. Updated, thank you for the tip. This workaround is indeed more clean.
  2. What you describe is accurate but for Kubeflow 1.9. This feature isn’t present in 1.8, for which the guide is right now. We 'd need to update both this and the AKS one. I 'll raise an issue about this, thanks for catching it.