does not work
[quote=“nohaihab, post:1, topic:7819”]
microk8s enable dns storage ingress metallb:10.64.140.43-10.64.140.49
[/quote]DEPRECIATION WARNING: ‘storage’ is deprecated and will soon be removed. Please use ‘hostpath-storage’ instead.
storage
is deprecated, and should be hostpath-storage
as per the warning when you execute the enable:
DEPRECIATION WARNING: 'storage' is deprecated and will soon be removed. Please use 'hostpath-storage' instead.
it’s also good to specify dns:{your upstream dns server}
if google dns is not reachable otherwise it won’t work.
Any details about how to do this? I have the same issue. The istio-pilot waiting for ip address
Hey! What do you get when you run this:
microk8s kubectl -n kubeflow get svc istio-ingressgateway-workload -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
After installation I cannot access the dashboard at http://10.64.140.43.nip.io, ‘the site can’t be reached.’
microk8s kubectl get gateway -A NAMESPACE NAME AGE knative-serving knative-ingress-gateway 107m knative-serving knative-local-gateway 107m
juju config dex-auth public-url=http://10.64.140.43.nip.io WARNING the configuration setting “public-url” already has the value “http://10.64.140.43.nip.io”
The istio looks fine: istio-ingressgateway-workload LoadBalancer 10.152.183.197 10.64.140.43 80:30302/TCP,443:30118/TCP 97m
My laptop is running on 2 ips: the wired is on 10.64.140.38 and the Wifi is on 192.168.12.149
So I got this 10.64.140.43. Since my Wifi interface is on 192.168.12.149 I connected my wired LAN to another router. I assume since the laptop and the service IP are on the same block I should be able to access it?
Followed the instruction and reinstalled many times with clean microk8s and juju. It seems the following errors are common across all my installations:
> App Version Status Scale Charm Channel Rev Address Exposed Message
> katib-controller res:oci-image@111495a waiting 1 katib-controller 0.15/stable 206 10.152.183.117 no
> kfp-api res:oci-image@e08e41d waiting 1 kfp-api 2.0/stable 298 10.152.183.4 no
> kfp-persistence res:oci-image@516e6b8 waiting 1 kfp-persistence 2.0/stable 294 no
> tensorboard-controller waiting 1 tensorboard-controller 1.7/stable 156 no Waiting for gateway info relation
> Unit Workload Agent Address Ports Message
> katib-controller/0* error idle 10.1.216.91 443/TCP,8080/TCP crash loop backoff: back-off 5m0s restarting failed container=katib-controller pod=katib-controller-54846dbdbf-krk6z_...
> kfp-api/0* error idle 10.1.216.92 8888/TCP,8887/TCP crash loop backoff: back-off 5m0s restarting failed container=ml-pipeline-api-server pod=kfp-api-5cd4db4554-n76qf_kub...
> kfp-persistence/0* error idle 10.1.216.3 crash loop backoff: back-off 5m0s restarting failed container=ml-pipeline-persistenceagent pod=kfp-persistence-5bbb9d...
> tensorboard-controller/0* waiting idle Waiting for gateway info relation
Also the machine has wireless card connected to the internet that has 192.168.12.xxx address and a LAN card on 10.40.140.xxx block. I also added 10.40.140.43.nip.io to the host file after seeing ns lookup failure.
Hey did you try this:
An issue you might have is the tensorboard-controller
component might be stuck with a status of waiting
and a message “Waiting for gateway relation”. To fix this, run:
juju run --unit istio-pilot/0 -- "export JUJU_DISPATCH_PATH=hooks/config-changed; ./dispatch"
This is a known issue, see tensorboard-controller GitHub issue for more info.
Hey,
So maybe there’s a conflict with your LAN.
The tutorial assumes the Kubeflow dashboard will be accessible at http://10.64.140.43.nip.io
. Considering that your laptop is running on two IP addresses and one of them is mentioned as 10.64.140.38
, there’s a possibility that you are already on a local area network (LAN) that uses the IP address range 10.64.140.0/24
, which includes the IP address 10.64.140.43
mentioned in the tutorial.
To resolve this, you can try the following steps:
- Confirm the IP address range being used by your local network. Check if the range overlaps with the IP address
10.64.140.43
. If there’s a conflict, it can prevent you from accessing the Kubeflow dashboard. - If a conflict exists, you can modify the IP address range specified in the tutorial’s configuration. For example, let’s say you want to change the IP address to
10.64.141.43
to avoid conflicts. Run the following command instead of the originalmicrok8s enable
command:
microk8s enable dns hostpath-storage ingress metallb:10.64.141.43-10.64.141.49
This will set up the private IP address 10.64.141.43
as accessible within your VM environment.
Note: I haven’t tested this specific configuration myself, it’s just an idea
> App Version Status Scale Charm Channel Rev Address Exposed Message
> kfp-api res:oci-image@e08e41d waiting 1 kfp-api 2.0/stable 298 10.152.183.106 no
> kfp-persistence res:oci-image@516e6b8 waiting 1 kfp-persistence 2.0/stable 294 no
> Unit Workload Agent Address Ports Message
> kfp-api/0* error idle 10.1.216.122 8888/TCP,8887/TCP crash loop backoff: back-off 5m0s restarting failed container=ml-pipeline-api-server pod=kfp-api-6658f6984b-dd8mp_kub...
> kfp-persistence/0* error idle 10.1.216.123 crash loop backoff: back-off 5m0s restarting failed container=ml-pipeline-persistenceagent pod=kfp-persistence-5d7987...
The errors are pretty consistent across my installation attempts.
Also when I finally get to the Dashboard it stuck at ‘creating namespace’. After the namespace created it stays at the same page. When you back off and click the ‘Finish’ again it tells you the namespace already exists.
Could be an issue with microk8s
and the default inotify
limits. Did you run this?
Thanks!. That solved the problem.
Error with juju deploy:
Located charm "tensorboards-web-app" in charm-hub, channel 1.7/stable
Located charm "training-operator" in charm-hub, channel 1.6/stable
ERROR lost connection to pod
ERROR lost connection to pod
ERROR cannot deploy bundle: cannot resolve charm or bundle "jupyter-ui": connection is shut down
Tried several times and it seems that the ERROR happened at different stages each time. I am doing it on a 3 node cluster on microk8s
Hi Andrew, can you run some commands to inspect / diagnose / debug what’s going on and let us know what’s happening?
Here are just some ideas:
- What are the results of microk8s status and juju status
- Use kubectl to get more info about what’s going on: get / describe / logs
- Check your network connectivity between the microk8s nodes
- Try sshing into the pods and see what happens
Also describe in as much detail steps to reproduce your exact setup, including the machine specs (RAM, OS etc.) you’re using, what cloud provider if in the cloud, how you setup the microk8s cluster etc.
Try to get as much info as you can for us, and then ping us back here with all the info. Then we’ll take it from there.
Tried on a clean new Ubuntu install (1 node only) following the documents with microk8s and this. Here is the screen.
andrew@G9:~$ juju bootstrap microk8s
Creating Juju controller "microk8s-localhost" on microk8s/localhost
Bootstrap to Kubernetes cluster identified as microk8s/localhost
Fetching Juju Dashboard 0.8.1
Creating k8s resources for controller "controller-microk8s-localhost"
Starting controller pod
Bootstrap agent now started
Contacting Juju controller at 10.152.183.148 to verify accessibility...
ERROR lost connection to pod
Bootstrap complete, controller "microk8s-localhost" is now available in namespace "controller-microk8s-localhost"
Now you can run juju add-model to create a new model to deploy k8s workloads.
For the previous install I ignored this ERROR and continued with add-model. I think this is the root of the problem. The network settings:
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2: enp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
> link/ether ec:aa:a0:18:45:bf brd ff:ff:ff:ff:ff:ff
> 3: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether 9c:b6:d0:0c:eb:35 brd ff:ff:ff:ff:ff:ff
> inet 192.168.12.100/24 brd 192.168.12.255 scope global dynamic noprefixroute wlp2s0
> valid_lft 41777sec preferred_lft 41777sec
> inet6 2607:fb91:87e:825a:856:6482:8976:6010/128 scope global dynamic noprefixroute
> valid_lft 2180sec preferred_lft 830sec
> inet6 2607:fb91:87e:825a:1ed6:460:ed44:1905/64 scope global temporary dynamic
> valid_lft 86149sec preferred_lft 14149sec
> inet6 2607:fb91:87e:825a:9b37:9d88:b462:b6fa/64 scope global dynamic mngtmpaddr noprefixroute
> valid_lft 86149sec preferred_lft 14149sec
> inet6 fe80::6c92:966f:6ff9:8f06/64 scope link noprefixroute
> valid_lft forever preferred_lft forever
> 6: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
> link/ether 66:c8:18:67:df:7a brd ff:ff:ff:ff:ff:ff
> inet 10.1.243.128/32 scope global vxlan.calico
> valid_lft forever preferred_lft forever
> inet6 fe80::64c8:18ff:fe67:df7a/64 scope link
> valid_lft forever preferred_lft forever
> 7: calica2d35e61c3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-5110dfe6-5021-3929-7c13-826d6ff19ee8
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 8: cali32780c3cbff@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dbaaddb1-bc74-3af9-4ecb-13bc54c630de
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 9: calicb43c72bc0b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-c2fb5bc9-a1a0-5f1c-b3c1-16e679ecc7f9
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 10: calic40d2f9fe91@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-94b4a477-f3fd-f732-bfd1-e1f85c8ace54
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 11: calicd215964915@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dc847b39-d6d1-4672-a176-99b4fc226b48
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 13: cali2866693e52b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-84a8ae58-9fbe-be94-7ada-6c63ac16b0e8
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
> 14: calibe02fed43bb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-a1aa2784-d203-f334-443a-83d33fd7c3d1
> inet6 fe80::ecee:eeff:feee:eeee/64 scope link
> valid_lft forever preferred_lft forever
The only time I ever saw Juju hanging was because I didn’t have enough disk space.
What are your machine specs for the machine are you running Juju / microk8s on? Do they satisfy all the requirements from the tutorial?
- Runs Ubuntu 20.04 (focal) or later.
- Has at least 4 cores, 32GB RAM and 50GB of disk space available.
- Is connected to the internet for downloading the required snaps and charms.
- Has
python3
installed.
Hey @kkhe56, looks like something is going on with the MySQL databases.
With a quick look the root problems seem to be katib-db and kfp-db, which are using the mysql-k8s charms. The other charms then are blocked because kfp-api is waiting for the database.
We’ll need to further dig into your environment to understand why the DB charms are in “unknown” state. Could you follow up with an issue in https://github.com/canonical/bundle-kubeflow where the team can follow up and ask for more technical details?
I think that this can now all be replaced by just running juju bootstrap microk8s
, as Juju can automatically connect itself to microk8s.
The 3.1/stable
channel is now only available with strict confinement, and will spit out a warning if you try to install it with --classic
:
itrue@kubeflow:~$ sudo snap install --classic --channel 3.1/stable juju
Warning: flag --classic ignored for strictly confined snap juju
juju (3.1/stable) 3.1.7 from Canonical✓ installed