Create an EKS cluster for use with an MLOps platform

Welcome to the Create an EKS Cluster guide. This how-to guide will take you through the steps of creating an AWS Elastic Kubernetes Service (EKS) cluster with an appropriate configuration for deploying an MLOps platform such as Kubeflow.

Requirements:

Steps

Install and set up AWS CLI

First, install AWS CLI on your local machine and then set it up. You can use any of the authentication ways available for AWS CLI.

For example, you can use IAM user credentials. If you decide to follow this path, the IAM user created should have these minimum IAM policies for eksctl to work. For more details on this, see the eksctl documentation

SSH Key pair

This is not a hard requirement, but you’ll usually want to SSH into the instances created by this tutorial for debugging etc. For this, you will need a key pair in the region they will be created. Here is the official documentation for this. You can either create a new key pair or import one you already have.

Install eksctl

Install eksctl, see here for instructions.

Install kubectl

Install kubectl, see here for instructions. You should have no problem following the guide with any version of kubectl but note that we are using version 1.24.x, since latest Kubernetes version supported by Kubeflow is 1.24.

Deploy EKS cluster

For the purpose of making the process of creating an EKS cluster as simple as possible, we will use a yaml file.

Do not forget that this deployment will incur charges for every hour the cluster is running.

Clone the following repository containing the EKS yaml file.

git clone https://github.com/canonical/kubeflow-examples.git
cd kubeflow-examples/eks-cluster-setup

Before proceeding with deployment, you may need to configure some of the fields in the cluster.yaml file in the above directory. This file was created with minimum requirements in mind for deploying Charmed Kubeflow/MLFlow.

  • region: The cluster will deployed by default to eu-central-1 zone. Feel free to edit metadata.region and availabilityZones according to your needs.
  • ssh key: As mentioned above, edit managedNodeGroups[0].ssh.publicKeyName field with your key pair name in order to be able to SSH into the new EC2 instances.
  • instance type: The cluster will be deployed with EC2 instances of type t2.2xlarge for worker nodes, according to the managedNodeGroups[0].instanceType field. This type should be sufficient for an MLOps platform but it has been observed that in the case of MLFlow and Kubeflow integration, t3.2xlarge is required due to higher network capabilities. See here for more details on instance types.
  • k8s version: This cluster will use Kubernetes version 1.24 by default. Make sure to edit this according to your needs. For Charmed Kubeflow, see the supported versions and use the lastest supported version available according to the bundle you 're deploying.
  • worker nodes: This cluster will have 2 worker nodes. Feel free to edit the maxSize and minSize under managedNodeGroups[0] according to your needs.
  • volume size: Each worker node will have gp2/gp3 disk of size 100Gb. Feel free to edit managedNodeGroups[0].volumeSize.

Now deploy the cluster with following command:

eksctl create cluster -f cluster.yaml

This will take some time (approximately 20 minutes). The command will create kubernetes cluster of version 1.24 with two worker nodes, where each node will have 100GBs of disk space. It will also create the Amazon EBS CSI driver IAM role which EC2 instances will use to manage EBS volumes and will add the Amazon EBS CSI add-on to the cluster. Lastly, it will also create a storage class in your cluster which is needed for deploying an MLOps platform.

Verify kubectl access to cluster

You may check your access to the cluster by running command below which should return a list of two nodes.

kubectl get nodes

Troubleshoot kubectl

In case the eksctl create cluster command completed successfully without errors but kubectl doesn’t return the expected nodes, there is chance that your kube config file may not be up-to-date so see here for instructions on updating it. Normally you shouldn’t have to since eksctl takes care of this.


Clean up resources

If you no longer need the created EKS cluster, refer here for deletion instructions. Keep in mind that the above procedure does not always delete the volumes that have been created during the cluster deployment, so if present and they do not contain any data you would like to keep, proceed to delete them manually.