Quick start guide to Kubeflow

evilnick · 10 November 2021 13:47

Note: Superseded by Get started with Charmed Kubeflow | Documentation | Charmed Kubeflow . Please see that doc instead.

Ready to try out Kubeflow? This tutorial will guide you through the steps to get Kubeflow up and running with the minimum of hassle. To keep things simple, we are going to make the following assumptions:

You are running Ubuntu 20.04(focal) or later.
You have at least 16GB free memory and 50GB of disk space
You have access to the internet for downloading the required snaps and charms.

The fastest, simplest way to get started with Kubeflow is to set up a local Kubernetes with MicroK8s. The low overheads required by MicroK8s make it ideal if you are trying to squeeze Kubeflow onto a laptop or virtual machine for a quick experiment or working on the move.

Contents:

Install and prepare MicroK8s
Install Juju
Deploying Kubeflow
Configuration
Accessing the Dashboard
Did something go wrong?

Install and prepare MicroK8s

The first step on our journey is to install MicroK8s. MicroK8s is installed from a snap package. The published snap maintains different channels for different releases of Kubernetes. The current 1.6 release supports Kubernetes 1.22.

sudo snap install microk8s --classic --channel=1.22/stable

For MicroK8s to work without having to use sudo for every command, it creates a group called microk8s. To make it more convenient to run commands, you should add the current user to this group:

sudo usermod -a -G microk8s $USER
newgrp microk8s

It is also useful to make sure the user has the proper access and ownership of any kubectl configuration files:

sudo chown -f -R $USER ~/.kube

MicroK8s will start up as soon as it is installed. It is a completely functional Kubernetes, running with the least amount of overhead possible. However, for our purposes we will need a Kubernetes with a few more features. A lot of extra services are available as MicroK8s “add-ons” - code which is shipped with the snap and can be turned on and off when it is needed. We can now enable some of these features to get a Kubernetes where we can usefully install Kubeflow. We will add a DNS service, so the applications can find each other, storage, an ingress controller so we can access Kubeflow components and the MetalLB load balancer application. These can all be enabled simply at the same time:

microk8s enable dns storage ingress metallb:10.64.140.43-10.64.140.49

You can see that we added some detail when enabling MetalLB, in this case the address pool to use. Many of the add-ons have extra configuration options, which can be found in the MicroK8s documentation.

It can take some minutes for MicroK8s to install and set up these additional features. Before we do anything else, we should check that the add-ons have been enabled successfully and that MicroK8s is ready for action. We can do this by requesting the status, and supplying the --wait-ready option, which tells microk8s to finish whatever processes it is working on before returning:

microk8s status --wait-ready

Now we have a working Kubernetes ready, the next step is to install Juju.

Install Juju

Juju is an operation Lifecycle manager(OLM) for clouds, bare metal or Kubernetes. We will be using it to deploy and manage the components which make up Kubeflow.

As with MicroK8s, Juju is installed from a snap package:

sudo snap install juju --classic

As Juju already has a built-in knowledge of MicroK8s and how it works, there is no additional setup or configuration needed. All we need to do is run the command to deploy a Juju controller to the Kubernetes we set up with MicroK8s:

juju bootstrap microk8s

The controller is Juju’s agent, running on Kubernetes, which can be used to deploy and control the components of Kubeflow.

The controller can work with different models, which map to namespaces in Kubernetes. It is recommended to set up a specific model for Kubeflow:

juju add-model kubeflow

Model name must be Kubeflow: Due to an assumption made in the upstream Kubeflow Dashboard code, Kubeflow must be deployed in the Kubernetes namespace kubeflow and so we have to use the model name kubeflow here.

That’s it for installing Juju!

Deploying Kubeflow

Charmed Kubeflow is really a collection of charms. Each of these charms deploy and control one application which goes to make up Kubeflow. You can actually just install the components you want, by individually deploying the charms and relating them to each other to build up Kubeflow. For convenience though, there are three bundles available. The bundles are really a recipe for a particular deployment of Kubeflow, configuring and relating the applications so you end up with a working deployment with the minimum of effort.

The full Kubeflow bundle will require a lot of resources (at least 2 CPUs, 16GB of free RAM and 50GB of disk space), so unless you know that’s what you want and have the resources to match, we recommend starting with the ‘kubeflow-lite’ bundle (the contents of the bundles are shown in the reference documentation ).

juju deploy kubeflow-lite --trust

Juju will now fetch the applications and begin deploying them to the MicroK8s Kubernetes. This process can take several minutes. You can track the progress by running:

watch -c juju status --color

This will show a list of the applications and their current status. Don’t be surprised if a few show up error messages to begin with - a lot of the components rely on the operation of others, so it can take some time before everything is ready and talking to one another.

While that is going in, there are two pieces of post-install configuration which can usefully be done at this point.

Configuration

For authentication and allowing access to the dashboard service, some components will need to be configured with the URL to be allowed. This depends on the underlying network provider, but for the known case of running on a local MicroK8s, we also know what the URL will be. It can be configured with Juju using the following commands:

juju config dex-auth public-url=http://10.64.140.43.nip.io
juju config oidc-gatekeeper public-url=http://10.64.140.43.nip.io

Finding the URL: If you have a different setup for MicroK8s, or you are adapting this tutorial for a different Kubernetes, you can find the URL required by examining the IP address of the istio-ingressgateway service. For example, you can determinine this information using kubectl: microk8s kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

To enable simple authentication, and set a username and password for your Kubeflow deployment, run the following commands:

juju config dex-auth static-username=admin
juju config dex-auth static-password=admin

Feel free to use a different (more secure!) password if you wish.

Accessing the Dashboard

The URL for the Kubeflow dashboard is the same as the one determined earlier for the configuration steps - in the case of a default MicroK8s install, it’s: http://10.64.140.43.nip.io

From a browser on your local machine, this can be reached just by copying and pasting the URL. You should then see the dex login screen, where you should enter the username( it does say email address, but whatever string you entered to configure it will work fine) and your password from the configuration step.

You should now see the Kubeflow “Welcome” page:

Click on the “Start Setup” button. On the next screen you will be asked to create a namespace. This is just a way of keeping all the files and settings from one project in a single, easy-to-access place. You can choose any name you like…

Once you click on the “Finish” button, the Dashboard will be displayed!

More information on accessing the dashboard can be found in this guide.

Congratulations! You have just installed Kubeflow! You probably can’t wait to get started with exciting ML experiments, but be sure to also check out the Kubeflow basics tutorial to see an example of setting up pipelines and more.

Did something go wrong?

This section summarizes known issues and workarounds.

Kubeflow Dashboard can’t be accessed

Sometimes after accessing the URL specified in the Configuration section (juju config dex-auth public-url) the dashboard is not reachable (no response in the browser). This issue might be caused by a missing gateway resource in the cluster. You can list gateway resources in the cluster with microk8s kubectl get gateway -A. If the response is No resources found you can force the charm to create it the following way:

juju run --unit istio-pilot/0 -- "export JUJU_DISPATCH_PATH=hooks/config-changed; ./dispatch"

If you are running it on a VM instance in the public cloud, please go to the “Access dashboard section” from here.

Applications in an error state

Sometimes some applications in your Kubeflow deployment can be in an error state. You should see this with the juju status command. When this happens you can manually check the state of the pods in the cluster by running microk8s kubectl get po -n kubeflow. Pods are expected to be in Running status. If some pods are in CrashLoopBackOff you can further inspect the pod by checking the logs with microk8s kubectl logs -n kubeflow <name-of-the-pod>. If you see error messages like this one: “error”:“too many open files”` you can execute the following command on your host machine and the applications will slowly turn to active:

sudo sysctl fs.inotify.max_user_instances=1280
sudo sysctl fs.inotify.max_user_watches=655360

This behavior has been previously observed on pods of katib-controller, kubeflow-profiles, kfp-api and kfp-persistence

ppasotti · 27 January 2022 14:10

Before I was able to see the dex login screen, I had to add

nameserver 8.8.8.8
nameserver 208.67.222.222
nameserver 4.2.2.2

on top of /etc/resolv.conf

evilnick · 27 January 2022 16:14

I don’t know what your network setup is, but there are no ‘unusual’ requirements for MicroK8s/Kubeflow.

amo-mycena · 18 February 2022 10:45

Hi, @evilnick
I am using microk8s 1.21/stable, and microk8s.juju(–version 2.9.10-ubuntu-amd64).
I am stucking in cmd “juju bootstrap microk8s”.

ERROR preferred storage “microk8s.io/hostpath” not available

Is there any config for juju bootstrap or something I missed out?
Hope to hear your suggestions.

robgibbon · 18 February 2022 10:48

Hello, You’ll need to run microk8s enable storage dns first, but you should install the latest juju snap using sudo snap install juju --classic and use that, not the juju in microk8s.

amo-mycena · 18 February 2022 10:59

Hi, @robgibbon
I have done that cmds.
And I have use microk8s.juju and juju(from snap).
But still stuck in

ERROR preferred storage “microk8s.io/hostpath” not available Is there anying I missed out?

Thanks for your reply.

robgibbon · 18 February 2022 11:18

Hello again, Sometimes it can take a while until MicroK8s is ready after you run microk8s enable storage. Did you run microk8s status --wait-ready before bootstrapping juju? Maybe try just waiting a bit and then run juju bootstrap microk8s myk8s again after a few minutes.

amo-mycena · 18 February 2022 11:36

Thanks for your reply.@robgibbon All the prerequisite addons are enabled perfectly. But still have the problems.

pttr · 24 February 2022 13:16

I’m having problems with this tutorial, I tried to install it exactly with these steps but it seems some of the pods just crash all the time.

dex-auth res:oci-image@a74f783 error 1 dex-auth charmhub 2.28/stable 78 kubernetes creating or updating custom resource definitions: ensuring custom res ource definition “REMOVED LINK” with version “v1beta1”: CustomResourceDefinition apiextensions k8s io “authcodes.dex.coreos.com” is invalid: spec.versions[0] schema.openAPIV3Schema: Required value: schemas ar e required

kubeflow-dashboard res:oci-image@858a90f error 1 kubeflow-dashboard charmhub stable 64 kubernetes creating or updating custom resources: getting custom resources: atte mpt count exceeded: getting custom resource definition “profiles kubeflow org”: custom resource definition “profiles ubeflow org” not found kubeflow-profiles res:profile-image@f4450cf error 1 kubeflow-profiles charmhub stable 57 kubernetes creating or updating custom resource definitions: ensuring custom res ource definition “serviceroles rbac istio io” with version “v1beta1”: CustomResourceDefinition apiextensions k8s io “serviceroles rbac stio io” is invalid: spec versions[0] schema openAPIV3Schema: Required value: schema s are required

istio-ingressgateway/0* error idle 10.1.0.68 15020/TCP,80/TCP,443/TCP,15029/TCP,15030/TCP,15031/TCP,15032/TCP,15443/TCP,15011/TCP,8060/TCP,853/TCP crash loop backoff: back-off 5m0s restarting failed contain er=istio-proxy pod=istio-ingressgateway-846b8b8b9-mxfm6_kubeflow(21db34af-5a0c-485a-a784-4b79fbad7a31)

Could you please double check your distribution?

P.S. your forum sucks, it thinks those errors contain links so removed dots.

dominik.f · 24 February 2022 13:49

Hello @pttr are you running microk8s in the 1.21/stable channel? Kubeflow currently doesn’t support any later versions. You can check with snap info microk8s.

Edit: I’m sorry I was just able to reproduce this with our instructions, there might be something broken on our side, we will try and fix this as soon as possible.

dominik.f · 25 February 2022 13:22

@pttr after further investigation we have discovered that this is an issue with the latest juju version. To bypass this for now, during bootstrap you can do the following: juju bootstrap microk8s --agent-version="2.9.22"

alfax1962 · 4 March 2022 16:13

I have the kubeflow (following your quick guide with ubuntu 20.04) stucked facing this problem: registry.jujucharms.com/kubeflow-charmers/kfp-viz/oci-image@sha256:c90a5818043da47448c4230953b265a66877bd143e4bdd991f762cf47e2a16d6 is not uploding. Running the URL directly in the browser reveals a 404 error, also pinging the address it do not respond .

ca-scribner · 4 March 2022 22:13

@alfax1962 thanks for the message. I just ran through everything from scratch again and the only thing I needed that was outside the tutorial was to patch a role for the istio-ingressgateway charm using:

kubectl patch role -n kubeflow istio-ingressgateway-operator -p '{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"name":"istio-ingressgateway-operator"},"rules":[{"apiGroups":["*"],"resources":["*"],"verbs":["*"]}]}'

Did you use the --agent-version="2.9.22" as @dominik.f mentioned? Apart from that I’m not sure what else might be going wrong. If you could provide juju status and juju debug-log info for the failing charm there might be something helpful there. I’d also suggest doing a sudo snap remove microk8s --purge and trying again - perhaps something was left in microk8s that interacted with this?

alfax1962 · 5 March 2022 12:40

Thank you @ca-scribner very much for your prompt reply. All was solved. The problem was my network: the site didn’t answer in the timing required by kubernetes. After some hours the pod initialized correctly. Many regards

ca-scribner · 7 March 2022 14:18

You’re welcome! Glad to hear it

emcp · 22 April 2022 17:10

The controller can work with different models, which map to namespaces in Kubernetes. It is recommended to set up a specific model for Kubeflow:

this is not in fact the case. if you pick any name besides kubeflow for the model… the kubeflow desktop unit errors out… just heads up

ca-scribner · 22 April 2022 17:27

Yeah sorry I thought we had that model name issue covered in this guide, but must have been an old one. Atm there’s a hard-coded assumption in the upstream kubeflow dashboard code that expects kubeflow to deployed in the k8s namespace kubeflow.

emcp · 23 April 2022 08:25

ah, no worries…

Any idea on how or where to access spark in the full bundle ? Posted a ticket to the github here

https://github.com/canonical/bundle-kubeflow/issues/453

I am a total k8s newb so perhaps it’s in the full version but not called out explicitly in the application names or ?

ca-scribner · 25 April 2022 17:19

We all start as newbs

I see @dominik.f replied on the issue, but I also subscribed to it so if his suggestion doesn’t work out reply and we can try to sort it out

emcp · 25 April 2022 19:07

thank you Andrew, I have actually hit a bug it seems… so I need to tear down the controller and start from scratch… once that’s done I will retry

the bug is described here Bug #1968105 “Juju+microk8s: very weird behaviour” : Bugs : juju

edit: I’ve reconnected my client / controller and done the juju deploy spark-k8s

from there I am a bit lost… going to try to just … load a basic pyspark session… there’s not much documentation on the charmhub about this tho

Edit2: hmmm I seem to have notebooks just scheduling but never getting completed… my juju status shows an error on dex-auth/2

hook failed: "ingress-relation-broken"

Im going to just tear down and start from scratch… and then at the end add juju deploy spark-k8s and retry

Edit3: well after stopping and restarted the notebook it completed… I then tried a basic hello world NB with pyspark… I am assuming I need to set some… environment variables and point now to the spark-k8s application/unit in juju?