Create an AKS cluster for use with an MLOps platform

Welcome to the Create an AKS Cluster guide. This how-to guide will take you through the steps of creating an Azure Kubernetes Service (AKS) cluster with an appropriate configuration for deploying an MLOps platform such as Charmed Kubeflow.

Requirements:

Steps

Install and set up Azure CLI

First, install Azure CLI on your local machine and then sign in. You can use any of the authentication options available for Azure CLI. For example. the easier way to sign in to your local machine is the interactive one while using a service principal is a better-suited way for usage in a CI workflow.

In all cases, make sure that the authentication entity you 're using has at least the minimum permissions required for AKS granted. Apart from those, you will also need access to manage Resource groups. Thus, you will need to add the Managed Application Contributor Role.

All of those roles can be assigned in Azure’s portal via Subscriptions > Subscription name > Access control (IAM) > Add > Add role assignment.

Install kubectl

Install kubectl, see here for instructions. You should have no problem following the guide with any version of kubectl but note that we are using version 1.26.x, since latest Kubernetes version supported by Kubeflow is 1.26.

Deploy AKS cluster

Do not forget that this deployment will incur charges for every hour the cluster is running.

First, create a resource group, under which you will deploy the AKS cluster. This will be really helpful later when you may need to clean up resources.

az group create --name myResourceGroup --location westeurope

Regarding location, choose whichever one suits best your needs. You can list all locations available using az account list-locations -o table.

Now spin up the cluster using az aks create command. Before proceeding though, make sure to modify any parameters if needed. The configuration below was created with minimum requirements in mind for deploying Charmed Kubeflow/MLFlow. The full list of available parameters can be found here.

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --kubernetes-version 1.26 \
  --node-count 2 \
  --node-vm-size Standard_D8s_v3 \
  --node-osdisk-size 100 \
  --node-osdisk-type Managed \
  --os-sku Ubuntu \
  --ssh-key-value <path-to-public-key>
  • kubernetes-version: This cluster will use Kubernetes version 1.26 by default. Make sure to edit this according to your needs. For Charmed Kubeflow, see the supported versions and use the lastest supported version available according to the bundle you 're deploying.
  • node-count: This cluster will have exactly 2 worker nodes, given that cluster autoscaler option is disabled by default. You can also enable the cluster autoscaler and define instead max-count and min-count.
  • node-vm-size: The cluster will be deployed with Azure VM instances of size Standard_D8s_v3 for worker nodes. This type should be sufficient for an MLOps platform (see CKF documentation). For more details and sizes available, see here and here here.
  • node-osdisk-size: For the same reasons stated above, each node needs a volume of 100Gb attached to it.
  • node-osdisk-type: Node disks of type Managed are used since Ephemeral ones are better suited when applications are tolerant of individual VM failures, thus not suitable for CKF.
  • ssh-key-value: Public key path or key contents to install on node VMs for SSH access. This is used in order to be able to access individual nodes, mostly for debugging. Its default value is ~\.ssh\id_rsa.pub so if this is where your public key resides, this option can as well be skipped entirely.

This will take some time. The command will create kubernetes cluster of version 1.26 with two worker nodes, where each node corresponds to VM of size Standard_D8s_v3 with a 100GB disk. This will also create the required Azure resources which include a Virtual network, a Network security group, a Route table, a Load balancer and a Public IP address.

Verify kubectl access to cluster

Using kubectl config get-clusters, check if the AKS cluster has been added to your kubeconfig. If you don’t see it there, use the following command to add it.

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster --admin

You may need to remove --admin from the above command, depending on the type of kubeconfig that you have access to.

Now check your access to the cluster by running command below which should return a list of two nodes.

kubectl get nodes
NAME                                STATUS   ROLES   AGE     VERSION
aks-nodepool1-40664177-vmss000000   Ready    agent   8m31s   v1.26.10
aks-nodepool1-40664177-vmss000001   Ready    agent   8m31s   v1.26.10

Clean up resources

If you no longer need the created AKS cluster, refer here for deletion instructions. Normally, all you have to do is

az aks delete --resource-group myResourceGroup --name myAKSCluster --yes
az group delete --name myResourceGroup --yes

You can always check if a resource group exists using

az group exists --name <resource-group-name>