Controller for vsphere-cloud only uses ESXi host it was bootstrapped on

elvinas · 3 October 2019 06:27

Hm… I have read documentation and I wonder how that would work.

VSphere cloud is linked to a VCenter host and datacenter object, not specific ESXi host. So if I add additional cloud definition with same VCenter hosts this do not change situation if I use same Juju controller. If I add new controller then I still would be limited to one application per controller, as per CMR scenario 2. This would not be intended way to deploy an app as I would loose resiliency in case of host failure.

Or am I missing something?

babbageclunk · 3 October 2019 23:53

Ok - it sounds like I was wrong about this, sorry. The vsphere instance I have access to for testing only has one host so I wasn’t able to try it.

What are the hosts named in the vsphere UI? When finding the resource group to create a VM in we match the name of the host in the inventory, not the IP address or the DNS name (unless that happens to also be the inventory name). So I wouldn’t expect that the commands specifying zones=x.y.z.24 would do what you want.

In general, you should be able to create machines on both hosts with only one controller, no cross-model relations needed.

elvinas · 4 October 2019 05:39

Hosts in vSphere named via their IP addresses. Now they are part of the cluster Cluster01, but previously there was no cluster combined and specifying zones=‘IP_ADDRESS’ did not work. Cant it be that “.” in IP address are interpreted other than literal symbol, when matching name?

Will try to reconfigure that, when I find login to DNS VM running in same VCenter.

babbageclunk · 10 October 2019 01:32

Hi @elvinas - you were right, we weren’t respecting the root-disk-source constraint when creating VMs.

I’ve put up a PR to fix it - it should be released in 2.6.10 very soon.

elvinas · 10 October 2019 07:47

Thanks. Maybe it is already available in some 2.7 beta? As we have another issue with 2.6 as it does creates multiple network interfaces and then we have another set of problems. If I am not mistaken this was the issue: Bug #1800940 “Juju bootstrap vmware vsphere not working with vsa...” : Bugs : juju

Although I am not sure if that will solve my problem. Somehow I suspect, that Juju will not find host, when it will start respecting root-disk-source

elvinas · 14 October 2019 07:31

Hi @babbageclunk was the fix applied to version 2.7-beta1? As today I have tried to redeploy environment and first I have failed due to wrong “VM Network” being selected instead of distributed switch one. So I have rebuilt everything, including the controller. Version is still same 2.7-beta1, but now “juju status” shows the errors as boostraping controller if root-disk-source is not specified.

As soon as I have specified root-disk-source for all VM’s and shuffling VM’s between two nodes, Juju have created them on expected hosts. So it looks like the issue was fixed in 2.7-beta1.

The strange thing is that juju now selects wrong network. I have found some bug dated to 2017, where it was said something hardcoded “VM network” and this issue supposed to be fixed.

babbageclunk · 14 October 2019 07:42

Hi @elvinas - yes, the root-disk-source fix was merged into the 2.7 branch on Friday so if you were using a new build for testing then it would have been there.

Have you set the primary-network (and possibly external-network) values in the model configuration? If not Juju might pick the wrong one.

elvinas · 14 October 2019 07:58

OK. Thanks for the confirmation. Although it would be nice if juju would report build number or something so it can be identified that something have changed.

Regarding primary network. Yes, I have specified. And I see “DSwitch-VM Network” in bootstrap config file:

debadmin@ep-jumpbox:~$ grep -i netw .local/share/juju/*
.local/share/juju/bootstrap-config.yaml: container-networking-method: “”
.local/share/juju/bootstrap-config.yaml: disable-network-management: false
.local/share/juju/bootstrap-config.yaml: primary-network: DSwitch-VM Network

However VM’s end up in “VM Network”. Last week it seem to have worked and application was installing. Now deployment continues as expected, when I manually migrate VM’s from “VM Network” to “DSwitch-VM Network”.

This one supposed to be fixed more than a year ago: Bug #1619812 “juju bootstrap node or deploy service with vSphere...” : Bugs : juju

Or do we have a case of:
_99 little bugs in the code, _
99 bugs in the code,
1 bug fixed…

Compile again,
100 little bugs in the code?

elvinas · 17 October 2019 12:26

Today retried with the following juju versions:

juju (2.6/edge) 2.6.10+2.6-9f8a13f
edge: 2.7-beta1+develop-4819cdc 2019-10-17

Controller is bootstrapped via:

juju bootstrap  VIPOC/datacenter01 poc-controller \
    --config primary-network="DSwitch-VM Network" \
    --config datastore="srvt028_Local-01"

Behavior is the same. Controller is booted on correct network, however VM’s ends up in the “VM Network”, instead of “DSwitch-VM Network”.

In addition to that, deployment with 2.7-beta1+develop-4819cdc stalls with all hosts showing “agent initializing” and logs full of for each cluster node:

_unit-etcd-1: 14:24:09 DEBUG juju.worker.dependency "uniter" manifold worker stopped: subnet "x.y.z.0/24" not found_
_unit-etcd-1: 14:24:09 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: subnet "x.y.z.0/24" not found_

Earlier 2.7 build (3xxxxxx) did not have those.

babbageclunk · 17 October 2019 22:48

Hi @elvinas, sorry I missed your response! I think the problem might be that specifying --config applies to the controller model but not to any new models you create after that, so those are falling back to the default network. Could you try bootstrapping but setting the network and datastore with --model-default rather than --config?

The logging about missing subnets from the 2.7 edge version might be due to ongoing work that a couple of other team members are doing in that area - I’ll make sure they know about the problem. In the meantime it’s probably better to use the 2.6/edge snap.

elvinas · 21 October 2019 07:53

Hi @babbageclunk

Thanks, adding --model-default and specifying intended network for VM’s did help! I have added just primary network to avoid specifying VM’s on single host. So for the moment case is closed and I can continue development.

Thanks.

babbageclunk · 21 October 2019 20:27

Oh fantastic - glad to hear it!

nicowalker · 11 November 2019 14:46

@elvinas

I have 100% the same issue you had. I think that in Juju they should specify the datastore limitation - is it REALLY is a limitation.

I have vCenter, Enteprise Plus, all bells and whistles, only thing is: each ESX host has multiple “local” datastores - no shared datastores across ESX hosts. Certainly it is not a common production setup, but as the cloud is connecting directly to vCenter, it would be nice for Juju to make use of all hosts, and all available resources in the vSphere Datacenter pool.

N.

elvinas · 12 November 2019 07:05

@nicowalker I do not have a connection myself to another environment (as I am a contractor there) but according to team members, in that environment Juju is able to properly allocate VM’s on different ESXi hosts. Initial comparison shows difference in the hosts and datastore naming. We will be reinstalling ESXi environment and will try to rename hosts.

Naming schema in working environment is as follows:
ESXi hosts in vCenter are named as short hostname: node1, node2
Datastores are named as “hostname”-“datastore name”: node1-Local-N, node2-Local-N.

jorgerodriguezing · 3 December 2019 12:10

Hi,

I have the same config as you, 4 ESXi with no shared datastores, only local, and the problem I have is the following:
I bootstrap the controller to DataStore04: perfect
Then I try to deploy a Kubernetes charm, and set the datastore model config to DataStore04, but when juju tries to deploy the machines, on some of them, try to deploy it to another host, or cluster, given the following error: could not find datastore “DataStore04”, datastore(s) accesible: “DataStore01”, “DataStore01-SSD”; juju retries the creation of the machine, but after 10 attempts it fails and the machine stays in down state.

I have modified the charm to force the root-disk-source in constraints for each machine, with no results, and I have juju 2.70 version.

When deploying a charm it’s not allowed the --to zone=XXXX modifier, it would be great to deploy a charm to the cluster or host inside vCenter I choose, but it’s not possible.

How have you solved this?

panda · 11 February 2020 14:03

Same here. Tried everything to get the deployment to a specific host via bundle yaml contraints:

zones=CLUSTERNAME/HOSTFQDN
zones=HOSTFQDN
zones=CLUSTERNAME/HOST
zones=HOST

Nothing worked. Every time a get a
ERROR cannot deploy bundle: cannot create machine for holding etcd unit: cannot add a new machine: availability zone “CLUSTERNAME/HOSTFQDN” not found

Edit: I tried it with “VM/Host Groups” inside the cluster, but it still fails.

babbageclunk · 12 February 2020 01:04

Hi @panda - are you trying to deploy to a specific host inside a cluster? I don’t think the Juju zones constraint provides a way to specify a specific host inside a cluster - anything after the first slash is interpreted as a resource pool, not a host.

If you’re trying to deploy to a host or cluster, the name used should be whatever is displayed in the vSphere web client for the host/cluster, not the FQDN (unless they happen to be the same).

panda · 25 February 2020 14:42

Yes I try to use a specific host inside a cluster and yes it is the same (display name = dns name)

babbageclunk · 25 February 2020 22:53

I don’t think we have a way to specify that in Juju at the moment, sorry!
Could you give me some more details about your situation? Why do you need to deploy to a specific host in the cluster?

panda · 26 February 2020 09:17

One of the reasons is that juju deploys not equally on all available hosts inside a cluster. E. g. I have a 5 node cluster and deploy 10 instances of kubernetes-worker I would assume that every host will get two workers. But I have hosts with 3 workers and hosts with 0 worker and it is quite random which host gets how many.

Another reason is to get more control of where an instance is deployed. From the example above, one of the 5 hosts has less RAM and only one cpu socket instead of 2. Therefore it would be useful if there are less machines on than on the other four. Specifying hosts during deployment would help a lot. Same for if inside the cluster are more hosts than juju should use.

BR