Get Started with Charmed MLflow v2

michalhucko · 23 May 2023 12:40

Welcome to the tutorial on Charmed MLflow V2! MLflow is an open-source platform, used for managing machine learning workflows. It has four primary functions that include experiment tracking, model registry, model management and code reproducibility.

So wait, what does “Charmed MLflow” mean? Is it the same thing as MLflow? Yes and no. MLflow is a complex application, consisting of many components running together and communicating with each other. Charmed MLflow is a charm bundle that allows us to deploy MLflow quickly and easily. Don’t worry too much about what a “charm bundle” is right now. The key thing is that it’s going to make deploying MLflow very convenient for us: we’ll get MLflow up an running with just a few command line commands!

In this tutorial, we’re going to explore Charmed MLflow in a practical way. Using the CLI tool juju, we’ll deploy MLflow to a local microk8s cloud.

Prerequisites:

We are assuming that you are running this tutorial on a local machine with the following specs:

Runs Ubuntu 22.04 or later
Has at least 50GB free disk space

Contents:

Install and prepare MicroK8s
Install Juju
Deploy mlflow bundle
Access MLflow
Object storage credentials

Install and prepare MicroK8s

Let’s install MicroK8s. MicroK8s is installed from a snap package. The published snap maintains different channels for different releases of Kubernetes.

sudo snap install microk8s --classic --channel=1.24/stable

For MicroK8s to work without having to use sudo for every command, it creates a group called microk8s. To make it more convenient to run commands, you will add the current user to this group:

sudo usermod -a -G microk8s $USER
newgrp microk8s

It is also useful to make sure the user has the proper access and ownership of any kubectl configuration files:

sudo chown -f -R $USER ~/.kube

Enable the following Microk8s add-ons to configure your Kubernetes cluster with extra services needed to run Charmed Kubeflow.

microk8s enable dns hostpath-storage ingress metallb:10.64.140.43-10.64.140.49

Here, we added a DNS service, so the applications can find each other, storage, an ingress controller so we can access Kubeflow components and the MetalLB load balancer application. You can see that we added some detail when enabling MetalLB, in this case the address pool to use.

See More : Microk8s | How to use addons

We’ve now installed and configured MicroK8s. It will start running automatically, but can take 5 minutes or so before it’s ready for action.
Run the following command to tell MicroK8s to report its status to us when it’s ready:

microk8s status --wait-ready

Be patient - this command may not return straight away. The --wait-ready flag tells MicroK8s to wait for the Kubernetes services to initialise before returning. Once MicroK8s is ready, you will see something like the following output:

microk8s is running

Below this there will be a bunch of other information about the cluster.

Great, we have now installed and configured MicroK8s, and it’s running and ready!

Install Juju

Juju is an operation Lifecycle manager (OLM) for clouds, bare metal or Kubernetes. We will be using it to deploy and manage the components which make up Kubeflow.

To install Juju from snap, run this command:

sudo snap install juju --classic --channel=2.9/stable

Now, run the following command to deploy a Juju controller to the Kubernetes we set up with MicroK8s:

juju bootstrap microk8s

Sit tight while the command completes! The controller may take a minute or two to deploy.

The controller is Juju’s agent, running on Kubernetes, which can be used to deploy and control the components of Kubeflow.

Next, we’ll need to add a model for Kubeflow to the controller. Run the following command to add a model called kubeflow:

juju add-model kubeflow

The model name here can be anything. We’re just using kubeflow because often you may want to deploy MLflow along with Kubeflow, and in that case, the model name must be kubeflow. So it’s not a bad habit to have.

The controller can work with different models, which map 1:1 to namespaces in Kubernetes. In this case, the model name must be kubeflow, due to an assumption made in the upstream Kubeflow Dashboard code.

Great job: Juju has now been installed and configured for Kubeflow!

Deploy MLflow bundle

Before deploying, run these commands:

sudo sysctl fs.inotify.max_user_instances=1280
sudo sysctl fs.inotify.max_user_watches=655360

We need to run the above because under the hood, microk8s uses inotify to interact with the filesystem, and in large microk8s deployments sometimes the default inotify limits are exceeded.

Let’s now use Juju to deploy Charmed MLflow. Run the following command:

juju deploy mlflow --channel=2.1/edge --trust

This deploys the latest edge version of MLflow with MinIO as object storage and MySQL as metadata store.

Access MLflow

To access MLflow, visit the following URL in your web browser:

http://localhost:31380/

This will take you to the MLflow UI.

Note: by default Charmed MLflow creats a nodeport on port 31380 where you can access the MLflow UI.

That’s it! Charmed MLflow has been deployed locally with microk8s and Juju. You can now start using MLflow.

Reference: Object storage credentials

To use mlflow you need to have credentials to the object storage. The aforementioned bundle comes with minio. To get the minio credentials run the following command:

 juju run-action mlflow-server/0  get-minio-credentials --wait

This action will output secret-key and secret-access-key

Moula · 29 May 2023 16:41

Hi @michalhucko Thank you for your work. I deployed mlflow on kubeflow and without kubeflow with microk8s 1.24 + juju 3.2 on Ubuntu jammy . The deployments went well. Kubeflow works well too. For the web interface I go through the public ip of istio-pilot… since I have lots of bundles that I deploy on my IAAS MAAS 3.4. I have to go through the same ip-load-balancer for the Kubeflow+Mlfomw deployment ( 192.168.70.133.nip.io/mlflow/#) which does not work for the moment as for kubeflow. I must add the commands that will make the link with istio-pilot… For a deployment of mlflow-bundle without kubeflow, we must add traefik-k8s to the bundle to connect with the public ip. Thank’s. Moula

beliaev-maksim · 30 May 2023 12:40

JFYI
kubeflow and MLFlow of current versions were tested and support only 2.9, so you may see unexpected behaviour

Moula · 30 May 2023 17:20

@beliaev-maksim Sorry but the problem is not with the version of juju. Whether it’s version 2.9 or 3.2 the problem is the same. I even find that with juju 3.2 it goes faster on my platform. I have to use the public loadbalancer ip to connect. As with kubeflow 192.168.70.133.npi.io. Deployed with kubeflow you need the link with service/istio-ingressgateway-workload LoadBalancer 10.152.183.192:5000 ( 192.168.70.133.nip.io/mllow/#. Deployed mlflow bundle without kubeflow you have to add traefik-k8s as a gateway for the public ip 192.168.70.133… Thank’s.