Quick start guide to Kubeflow

Hello, You’ll need to run microk8s enable storage dns first, but you should install the latest juju snap using sudo snap install juju --classic and use that, not the juju in microk8s.

1 Like

Hi, @robgibbon
I have done that cmds.
And I have use microk8s.juju and juju(from snap).
But still stuck in

ERROR preferred storage “microk8s.io/hostpath” not available Is there anying I missed out?

Thanks for your reply.

Hello again, Sometimes it can take a while until MicroK8s is ready after you run microk8s enable storage. Did you run microk8s status --wait-ready before bootstrapping juju? Maybe try just waiting a bit and then run juju bootstrap microk8s myk8s again after a few minutes.

2 Likes

Thanks for your reply.@robgibbon All the prerequisite addons are enabled perfectly. But still have the problems.

I’m having problems with this tutorial, I tried to install it exactly with these steps but it seems some of the pods just crash all the time.

dex-auth res:oci-image@a74f783 error 1 dex-auth charmhub 2.28/stable 78 kubernetes creating or updating custom resource definitions: ensuring custom res ource definition “REMOVED LINK” with version “v1beta1”: CustomResourceDefinition apiextensions k8s io “authcodes.dex.coreos.com” is invalid: spec.versions[0] schema.openAPIV3Schema: Required value: schemas ar e required

kubeflow-dashboard res:oci-image@858a90f error 1 kubeflow-dashboard charmhub stable 64 kubernetes creating or updating custom resources: getting custom resources: atte mpt count exceeded: getting custom resource definition “profiles kubeflow org”: custom resource definition “profiles ubeflow org” not found kubeflow-profiles res:profile-image@f4450cf error 1 kubeflow-profiles charmhub stable 57 kubernetes creating or updating custom resource definitions: ensuring custom res ource definition “serviceroles rbac istio io” with version “v1beta1”: CustomResourceDefinition apiextensions k8s io “serviceroles rbac stio io” is invalid: spec versions[0] schema openAPIV3Schema: Required value: schema s are required

istio-ingressgateway/0* error idle 10.1.0.68 15020/TCP,80/TCP,443/TCP,15029/TCP,15030/TCP,15031/TCP,15032/TCP,15443/TCP,15011/TCP,8060/TCP,853/TCP crash loop backoff: back-off 5m0s restarting failed contain er=istio-proxy pod=istio-ingressgateway-846b8b8b9-mxfm6_kubeflow(21db34af-5a0c-485a-a784-4b79fbad7a31)

Could you please double check your distribution?

P.S. your forum sucks, it thinks those errors contain links so removed dots.

Hello @pttr are you running microk8s in the 1.21/stable channel? Kubeflow currently doesn’t support any later versions. You can check with snap info microk8s.

Edit: I’m sorry I was just able to reproduce this with our instructions, there might be something broken on our side, we will try and fix this as soon as possible.

@pttr after further investigation we have discovered that this is an issue with the latest juju version. To bypass this for now, during bootstrap you can do the following: juju bootstrap microk8s --agent-version="2.9.22"

2 Likes

I have the kubeflow (following your quick guide with ubuntu 20.04) stucked facing this problem: registry.jujucharms.com/kubeflow-charmers/kfp-viz/oci-image@sha256:c90a5818043da47448c4230953b265a66877bd143e4bdd991f762cf47e2a16d6 is not uploding. Running the URL directly in the browser reveals a 404 error, also pinging the address it do not respond .

@alfax1962 thanks for the message. I just ran through everything from scratch again and the only thing I needed that was outside the tutorial was to patch a role for the istio-ingressgateway charm using:

kubectl patch role -n kubeflow istio-ingressgateway-operator -p '{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"name":"istio-ingressgateway-operator"},"rules":[{"apiGroups":["*"],"resources":["*"],"verbs":["*"]}]}' 

Did you use the --agent-version="2.9.22" as @dominik.f mentioned? Apart from that I’m not sure what else might be going wrong. If you could provide juju status and juju debug-log info for the failing charm there might be something helpful there. I’d also suggest doing a sudo snap remove microk8s --purge and trying again - perhaps something was left in microk8s that interacted with this?

Thank you @ca-scribner very much for your prompt reply. All was solved. The problem was my network: the site didn’t answer in the timing required by kubernetes. After some hours the pod initialized correctly. Many regards

1 Like

You’re welcome! Glad to hear it

The controller can work with different models, which map to namespaces in Kubernetes. It is recommended to set up a specific model for Kubeflow:

this is not in fact the case. if you pick any name besides kubeflow for the model… the kubeflow desktop unit errors out… just heads up

Yeah sorry I thought we had that model name issue covered in this guide, but must have been an old one. Atm there’s a hard-coded assumption in the upstream kubeflow dashboard code that expects kubeflow to deployed in the k8s namespace kubeflow.

1 Like

ah, no worries…

Any idea on how or where to access spark in the full bundle ? Posted a ticket to the github here

https://github.com/canonical/bundle-kubeflow/issues/453

I am a total k8s newb so perhaps it’s in the full version but not called out explicitly in the application names or ?

We all start as newbs :slight_smile:

I see @dominik.f replied on the issue, but I also subscribed to it so if his suggestion doesn’t work out reply and we can try to sort it out

1 Like

thank you Andrew, I have actually hit a bug it seems… so I need to tear down the controller and start from scratch… once that’s done I will retry

the bug is described here Bug #1968105 “Juju+microk8s: very weird behaviour” : Bugs : juju

edit: I’ve reconnected my client / controller and done the juju deploy spark-k8s

from there I am a bit lost… going to try to just … load a basic pyspark session… there’s not much documentation on the charmhub about this tho

Edit2: hmmm I seem to have notebooks just scheduling but never getting completed… my juju status shows an error on dex-auth/2

hook failed: "ingress-relation-broken"

Im going to just tear down and start from scratch… and then at the end add juju deploy spark-k8s and retry

Edit3: well after stopping and restarted the notebook it completed… I then tried a basic hello world NB with pyspark… I am assuming I need to set some… environment variables and point now to the spark-k8s application/unit in juju?

image

I’ve been trying to port forward or create tunnel to access this dashboard since I installed kubeflow on my server. with the socks tunnel I get an error about the hostname not having an ip associated with it and I don’t know exactly how to deal with it

Hey, try this tutorial for setting up remote access to Kubeflow - Set up remote access | Charmed Kubeflow - hopefully it will help get you going

Rob

3 Likes

@robgibbon

Hey, Is there a tutorial about how to use openebs as a default storage class in kubeflow ?

I am looking forward to your reply. :grinning:

I tried to install kubeflow 1.6 on microk8s 1.22 over a few weeks but I do not succeed. I followed this quickguide and now all containers are up and running, but I cannot access the Web-UI with http://10.64.140.43.nip.io.

Further investigation showed that there is no gateway configured.

microk8s kubectl get gateway -A

so I tried this fix

cat <<EOF | kubectl create -f -
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
 labels:
   app.istio-pilot.io/is-workload-entity: "true"
   app.juju.is/created-by: istio-pilot
 name: kubeflow-gateway
 namespace: kubeflow
 resourceVersion: "2203"
 spec:
 selector:
   istio: ingressgateway
 servers:
 - hosts:
   - '*'
   port:
 name: http
 number: 80
 protocol: HTTP
EOF

but I got this error

error: error validating "STDIN": error validating data: [ValidationError(Gateway.metadata): unknown field "selector" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta, ValidationError(Gateway.metadata): unknown field "servers" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta]; if you choose to ignore these errors, turn validation off with --validate=false

My installation:

Model     Controller          Cloud/Region        Version  SLA          Timestamp
kubeflow  microk8s-localhost  microk8s/localhost  2.9.35   unsupported  23:22:56+02:00

App                        Version                    Status  Scale  Charm                    Channel         Rev  Address         Exposed  Message
admission-webhook          res:oci-image@84a4d7d      active      1  admission-webhook        1.6/stable       50  10.152.183.210  no       
argo-controller            res:oci-image@669ebd5      active      1  argo-controller          3.3/stable       99                  no       
dex-auth                                              active      1  dex-auth                 2.31/stable     129  10.152.183.230  no       
istio-ingressgateway                                  active      1  istio-gateway            1.11/stable     114  10.152.183.183  no       
istio-pilot                                           active      1  istio-pilot              1.11/stable     131  10.152.183.252  no       
jupyter-controller         res:oci-image@8f4ec33      active      1  jupyter-controller       1.6/stable      138                  no       
jupyter-ui                 res:oci-image@cde6632      active      1  jupyter-ui               1.6/stable       99  10.152.183.198  no       
kfp-api                    res:oci-image@1b44753      active      1  kfp-api                  2.0/stable       81  10.152.183.36   no       
kfp-db                     mariadb/server:10.3        active      1  charmed-osm-mariadb-k8s  latest/stable    35  10.152.183.47   no       ready
kfp-persistence            res:oci-image@31f08ad      active      1  kfp-persistence          2.0/stable       76                  no       
kfp-profile-controller     res:oci-image@d86ecff      active      1  kfp-profile-controller   2.0/stable       61  10.152.183.142  no       
kfp-schedwf                res:oci-image@51ffc60      active      1  kfp-schedwf              2.0/stable       80                  no       
kfp-ui                     res:oci-image@55148fd      active      1  kfp-ui                   2.0/stable       80  10.152.183.236  no       
kfp-viewer                 res:oci-image@7190aa3      active      1  kfp-viewer               2.0/stable       79                  no       
kfp-viz                    res:oci-image@67e8b09      active      1  kfp-viz                  2.0/stable       74  10.152.183.212  no       
kubeflow-dashboard         res:oci-image@6fe6eec      active      1  kubeflow-dashboard       1.6/stable      154  10.152.183.245  no       
kubeflow-profiles          res:profile-image@0a46ffc  active      1  kubeflow-profiles        1.6/stable       82  10.152.183.168  no       
kubeflow-roles                                        active      1  kubeflow-roles           1.6/stable       31  10.152.183.193  no       
kubeflow-volumes           res:oci-image@cc5177a      active      1  kubeflow-volumes         1.6/stable       64  10.152.183.141  no       
metacontroller-operator                               active      1  metacontroller-operator  2.0/stable       48  10.152.183.178  no       
minio                      res:oci-image@1755999      active      1  minio                    ckf-1.6/stable   99  10.152.183.2    no       
oidc-gatekeeper            res:oci-image@32de216      active      1  oidc-gatekeeper          ckf-1.6/stable   76  10.152.183.79   no       
seldon-controller-manager  res:oci-image@eb811b6      active      1  seldon-core              1.14/stable      92  10.152.183.253  no       
training-operator                                     active      1  training-operator        1.5/stable       65  10.152.183.211  no       

Unit                          Workload  Agent  Address      Ports              Message
admission-webhook/0*          active    idle   10.1.85.150  4443/TCP           
argo-controller/0*            active    idle   10.1.85.186                     
dex-auth/0*                   active    idle   10.1.85.146                     
istio-ingressgateway/0*       active    idle   10.1.85.148                     
istio-pilot/0*                active    idle   10.1.85.149                     
jupyter-controller/0*         active    idle   10.1.85.175                     
jupyter-ui/0*                 active    idle   10.1.85.178  5000/TCP           
kfp-api/0*                    active    idle   10.1.85.188  8888/TCP,8887/TCP  
kfp-db/0*                     active    idle   10.1.85.160  3306/TCP           ready
kfp-persistence/0*            active    idle   10.1.85.187                     
kfp-profile-controller/0*     active    idle   10.1.85.184  80/TCP             
kfp-schedwf/0*                active    idle   10.1.85.163                     
kfp-ui/0*                     active    idle   10.1.85.189  3000/TCP           
kfp-viewer/0*                 active    idle   10.1.85.169                     
kfp-viz/0*                    active    idle   10.1.85.182  8888/TCP           
kubeflow-dashboard/0*         active    idle   10.1.85.172  8082/TCP           
kubeflow-profiles/0*          active    idle   10.1.85.167  8080/TCP,8081/TCP  
kubeflow-roles/0*             active    idle   10.1.85.152                     
kubeflow-volumes/0*           active    idle   10.1.85.171  5000/TCP           
metacontroller-operator/0*    active    idle   10.1.85.153                     
minio/0*                      active    idle   10.1.85.166  9000/TCP,9001/TCP  
oidc-gatekeeper/0*            active    idle   10.1.85.185  8080/TCP           
seldon-controller-manager/0*  active    idle   10.1.85.174  8080/TCP,4443/TCP  
training-operator/0*          active    idle   10.1.85.155        

It seems this fix is not working with microk8s Version 1.22.

Does anyone have an idea how to get kubeflow up and running?