Welcome to the Create an AKS Cluster guide. This how-to guide will take you through the steps of creating an Azure Kubernetes Service (AKS) cluster with an appropriate configuration for deploying an MLOps platform such as Charmed Kubeflow.
Requirements:
- Local machine with Ubuntu 22.04 or later
- An Azure account (How to create an Azure account)
Steps
Install and set up Azure CLI
First, install Azure CLI on your local machine and then sign in. You can use any of the authentication options available for Azure CLI. For example. the easier way to sign in to your local machine is the interactive one while using a service principal is a better-suited way for usage in a CI workflow.
In all cases, make sure that the authentication entity you 're using has at least the minimum permissions required for AKS granted. Apart from those, you will also need access to manage Resource groups. Thus, you will need to add the Managed Application Contributor Role
.
All of those roles can be assigned in Azure’s portal via Subscriptions > Subscription name > Access control (IAM) > Add > Add role assignment
.
Install kubectl
Install kubectl
, see here for instructions. You should have no problem following the guide with any version of kubectl
but note that we are using version 1.29.x
, since latest Kubernetes version supported by Kubeflow is 1.29
.
Deploy AKS cluster
Do not forget that this deployment will incur charges for every hour the cluster is running.
First, create a resource group, under which you will deploy the AKS cluster. This will be really helpful later when you may need to clean up resources.
az group create --name myResourceGroup --location westeurope
Regarding location, choose whichever one suits best your needs. You can list all locations available using az account list-locations -o table
.
Now spin up the cluster using az aks create
command. Before proceeding though, make sure to modify any parameters if needed. The configuration below was created with minimum requirements in mind for deploying Charmed Kubeflow/MLFlow. The full list of available parameters can be found here.
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--kubernetes-version 1.29 \
--node-count 2 \
--node-vm-size Standard_D8s_v3 \
--node-osdisk-size 100 \
--node-osdisk-type Managed \
--os-sku Ubuntu \
--ssh-key-value <path-to-public-key>
- kubernetes-version: This cluster will use Kubernetes version
1.29
by default. Make sure to edit this according to your needs. For Charmed Kubeflow, see the supported versions and use the lastest supported version available according to the bundle you 're deploying. - node-count: This cluster will have exactly 2 worker nodes, given that cluster autoscaler option is disabled by default. You can also enable the cluster autoscaler and define instead
max-count
andmin-count
. - node-vm-size: The cluster will be deployed with Azure VM instances of size
Standard_D8s_v3
for worker nodes. This type should be sufficient for an MLOps platform (see CKF documentation). For more details and sizes available, see here and here here. - node-osdisk-size: For the same reasons stated above, each node needs a volume of 100Gb attached to it.
- node-osdisk-type: Node disks of type
Managed
are used sinceEphemeral
ones are better suited when applications are tolerant of individual VM failures, thus not suitable for CKF. - ssh-key-value: Public key path or key contents to install on node VMs for SSH access. This is used in order to be able to access individual nodes, mostly for debugging. Its default value is
~\.ssh\id_rsa.pub
so if this is where your public key resides, this option can as well be skipped entirely.
This will take some time. The command will create kubernetes cluster of version 1.29
with two worker nodes, where each node corresponds to VM of size Standard_D8s_v3
with a 100GB disk. This will also create the required Azure resources which include a Virtual network, a Network security group, a Route table, a Load balancer and a Public IP address.
Verify kubectl access to cluster
Using kubectl config get-clusters
, check if the AKS cluster has been added to your kubeconfig
. If you don’t see it there, use the following command to add it.
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster --admin
You may need to remove --admin
from the above command, depending on the type of kubeconfig
that you have access to.
Now check your access to the cluster by running command below which should return a list of two nodes.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-nodepool1-18441560-vmss000000 Ready agent 1m20s v1.29.4
aks-nodepool1-40664177-vmss000001 Ready agent 1m20s v1.29.4
Clean up resources
If you no longer need the created AKS cluster, refer here for deletion instructions. Normally, all you have to do is
az aks delete --resource-group myResourceGroup --name myAKSCluster --yes
az group delete --name myResourceGroup --yes
You can always check if a resource group exists using
az group exists --name <resource-group-name>