Changes to juju cloud commands - multi-cloud controllers

wallyworld · 28 March 2019 03:47

We are working to deliver support for multi-cloud controllers. It’s on ongoing effort, and will take time to get the user experience polished. But multi-cloud controllers will be a thing. You can for example bootstrap to MAAS, deploy openstack, add that openstack cloud to the same existing controller, and deploy stuff to that second cloud. All without having to waste a controller node in the openstack cloud.

Part of that effort is to fix the experience around how the various Juju CLI cloud commands operate. We’re also working to fix credential management (especially adding or updating credentials on a controller). But for now, the cloud command changes and multi-cloud controller early access will be available soon in the 2.6 beta1 edge snap.

The key change is that like most/all other Juju commands, the cloud commands will operate (by default) on a running controller. So, just like add-model, these commands:

list-clouds
show-cloud
add-cloud
remove-cloud
update-cloud

will use the current controller, or accept a -c or --controller argument to use a different one.

For the times where you may be preparing to bootstrap to a new cloud, and you want to first create the cloud definition locally, you can use the --local argument.
eg juju add-cloud -f myclouds.yaml mycloud --local

Currently , interactive add-cloud is always local.

Possible Enhancement

One possible optimisation concerns the case where Juju has been installed for the first time. You want to bootstrap and need to see what clouds are available. You now need to run:
juju clouds --local

It could be that the juju clouds command becomes smart and if there’s no controllers available, just shows the list of clouds available for bootstrap. The messaging would be improved to reflect this, eg

$ juju clouds
There are no controllers running. You can bootstrap a new controller using one of these clouds:
Cloud           Regions  Default            Type        Description
aws                  15  us-east-1          ec2         Amazon Web Services
aws-china             2  cn-north-1         ec2         Amazon China
aws-gov               1  us-gov-west-1      ec2         Amazon (USA Government)
azure                27  centralus          azure       Microsoft Azure
azure-china           2  chinaeast          azure       Microsoft Azure China
cloudsigma           12  dub                cloudsigma  CloudSigma Cloud
google               18  us-east1           gce         Google Cloud Platform
joyent                6  us-east-1          joyent      Joyent Cloud
oracle                4  us-phoenix-1       oci         Oracle Cloud Infrastructure
oracle-classic        5  uscom-central-1    oracle      Oracle Cloud Infrastructure Classic
rackspace             6  dfw                rackspace   Rackspace Cloud
localhost             1  localhost          lxd         LXD Container Hypervisor

There’s an argument this could be confusing, ie list-clouds behaves differently if there’s a controller running or not. But with the appropriate messaging it could give the best of both worlds.

There’s time to tweak things prior to the 2.6 release depending on feedback, good or bad.

jamesbeedy · 28 March 2019 16:31

^fantastic.

The primary bump that most people in my org are hitting with cloud/credential mgmt right now is that there seems to be this disconnect for them between adding and updating a credential. Once the user make the distinction between local vs controller credentials I think things start to click.

We should harp on connectivity/routable hops here. For example, if you have a juju controller behind a NAT, you wont be able to deploy agents to aws as the agents aren’t going to be able to talk back to the controller. To me this is a “get the point across to the user really well that if they want their controller to manage multiple clouds that it has to have a routable connection from the cloud substrate” type problem. I didn’t understand how we would accommodate user facing messaging on this, but I’m sure we will figure it out.

wallyworld · 28 March 2019 21:04

Most users don’t need to deal with local. There’s already a running controller. They just want to add their credential to it so they can start adding models. The current workflow of having to even understand the distinction of local vs controller, needing to add locally first and then upload to a controller, is very confusing for people based on all the feedback we have had over the years. Having stuff local is necessary for bootstrapping, but after that not so relevant.

jamesbeedy:

wallyworld:

You can for example bootstrap to MAAS, deploy openstack, add that openstack cloud to the same existing controller, and deploy stuff to that second cloud. All without having to waste a controller node in the openstack cloud.

We should harp on connectivity/routable hops here. For example, if you have a juju controller behind a NAT, you wont be able to deploy agents to aws as the agents aren’t going to be able to talk back to the controller. To me this is a “get the point across to the user really well that if they want their controller to manage multiple clouds that it has to have a routable connection from the cloud substrate” type problem. I didn’t understand how we would accommodate user facing messaging on this, but I’m sure we will figure it out.

Indeed. This is recognised as an issue and we have not yet come up with a good answer on how to detect ahead of time whether adding a given cloud to a controller is feasible. As you say, agents need routability back to the controller, but how to detect if that’s going to be possible is the issue.

fguitton · 30 March 2019 10:44

Multi-cloud support for controllers is fantastic news !!!

We deploy our management plane in a LXD cluster and would want to bootstrap Juju there to then use MAAS and deploy Openstack. All residing on cross-reachable network segments. As the beta gets published we will be happy to do some testing there !

I realise this thread focuses on the command line changes, but I was wondering if model migration will be supported to bring back models deployed on a remote controller and allow consolidation. Currently the documentation states it needs to be on the same cloud (https://docs.jujucharms.com/devel/en/models-migrate).

I could not find the related item on Launchpad, is there a place were the status of this can be ‘watched’ ?

Thank you very much for all this great work ! Best regards,

wallyworld · 1 April 2019 07:23

Yeah, currently model migration has to be between clouds of the same type. That’s because there can be cloud specific model config (like vpc-id for AWS for example). There may be other reasons but that’s one I’m aware of.

We’ll announce progress here on this forum. What we have in the 2.6 beta1 edge snap is usable, but there’s a fair bit of polish needed around multi-user and credential management. If you just want to try it out with the out of the box admin user (who has permission to everything and doesn’t need an additional credential uploaded to the controller), bootstrapping to LXD and adding MAAS and Openstack clouds to that controller, we’d love to hear about your experience. And if you find issues, please do raise a bug in Launchpad.

jameinel · 2 April 2019 04:19

I don’t think he wants to migrate to a different cloud. He just already has controllers for the different clouds and wants to migrate back to a single controller. eg ‘juju bootstrap lxd, juju bootstrap maas; juju migrate maas:foo lxd:’

If we support ‘add-model’ with multiple clouds there is not reason to feel we couldn’t support migrate. Just one of “we may not have implemented support for it yet”.

jameinel · 2 April 2019 04:27

I’d also like to note that it isn’t just routability. That is certainly a necessary step. But there are many assumptions throughout the code about latency to the controller. There is an expectation that the controller is fairly local to the agents. We do things like caching agent binaries and resources and charm archives in the controller.

There is concern about people having stuff “work” but perform poorly. And while it is because they are not using the tool according to our expectations, the result is that Juju looks bad. (juju bootstrap lxd on your laptop with a public IP and then add-model AWS. works for now, but stops when you roam to another network. And consumes all of your upload bandwidth when you deploy the Hadoop charm).

The best I’ve come up with is checking the round trip time connecting to the cloud’s API from the controller and warning if that performance seems poor. You could feasibly do that during add-model, and it would give the user quick feedback about the viability of what they are doing.

fguitton · 5 April 2019 11:00

That’s correct ! In practice we have our core management tooling in an LXD cluster on a small subset of ‘manually manageable’ machines and our MAAS is essentially running in the same data center substrate at this point. Loosing a physical machine for the sole purpose of running Juju controller is suboptimal in our case. Being able to deploy the controller in the LXD cluster and have it manage the MAAS from there is great!

I can actually report that it’s been successful. I installed from 2.6-beta1 yesterday and manage to bootstrap the controller in LXD and deploy to MAAS successfully. There seems to be issues where the configuration changes to the application do not trigger the config-update hooks on the agents in some cases. I couldn’t figure out exactly why yet. I will try to keep you posted with further details later/file a bug report on launchpad.

Thank you very much for your wonderful work !

jameinel · 8 April 2019 05:35

Please do. I know we recently changed config-changed behavior. In the past, we didn’t track the previous config, so if we thought it might have changed, we would fire the hook. We now track a hash of the config, and if the hash doesn’t change, then we don’t fire the event. So it may have been setting foo=foo (not actually changing it) might have caused a config-changed to fire, but now we see that as a no-op and don’t run the hook.

It may be that there is a bug there, or it may be something that doesn’t actually change the content (setting a value to the default value), that used to trigger a change but doesn’t anymore.

seffyroff · 14 April 2019 20:10

This is absolutely going to make my life so much easier, and is wonderful WONDERFUL news. Thanks so much for making this a thing, you beautiful people.

My current workflow of spinning up a localhost controller, then deploying a maas controller to it, then deploying an lxd-cluster controller to that, then deploying lxd-cluster nodes, then deploying a k8s controller to those, whilst giving me what I want, is somewhat inefficient and has come out as a result of bumping into Juju’s (soundly thought out) constraints that enforce best practices.

The simple improvement of being able to place a controller once, and avoiding consuming a machine resource where they are a finite amount (MAAS) is gold.

wallyworld · 14 April 2019 22:41

The multi-cloud feature is now available to try in the just released 2.6 beta1. Please give it a try and report any issues so we can fix before the final release goes out.

seffyroff · 19 April 2019 06:34

I’m sort of getting my head around this, and coming to the realization that I’m going to quickly need to up my routing knowledge. Assuming I’m deploying multiple models to different clouds and that they’ll require CMR to enable comms, and that sometimes there’ll be a number of firewalls in the way of those comms, is there an established best practice to get the job done? I expect it depends greatly on the type of workload and the amount of security that needs to be traversed.

As it’s a model I’ve spent a lot of time playing with, I’m looking first at MAAS. My ideal deployment scenario is with the MAAS region and postgres services deployed to a vendorcloud (Azure in my case) and the MAAS rack controllers deployed to LXD containers hosted by base service machines outside the MAAS management umbrella, but on the same network (to avoid the chicken/egg MAAS deployment scenario), which for simplicity I’d have as a remote LXD controller to manage deployment to the base service machines in a central location.

So, I guess the deployment goes like this:
1: Base service machine (Machine #1)is deployed manually, with stock Bionic server image. Latest LXD installed via snap.
2: Juju Controller is deployed to Machine 1.
3: LXD Cloud for Machine 1 is added to Juju Controller.
4: Azure Cloud is added to Machine 1.
5: MAAS-Region and Postgres charms deployed to LXD containers new model in Azure Cloud (Model 1)
6: MAAS-Rack Charm is deployed to LXD container on Machine 1 (Model 2)
7: CMR between Model 1 and 2 allows MAAS Rack<>Region comms
8: Add MAAS Cloud to Machine 1.
9: Remaining machines in location are Commissioned by MAAS.
10: Regular workloads are deployed to MAAS cloud either at bare metal level or as LXD containers.

Somewhere before #7 I do some network magic to allow the CMR to take. Possibly by using Wireguard or Zerotier.

Right?

fguitton · 19 April 2019 10:15

Dear John, There might indeed be some issues here. I am currently playing with a HA setup of 2.6-beta1 and value are not propagated or committed correctly. I have just opened Bug #1825500 “[2.6-beta1] Configuration changes do not propagate...” : Bugs : Canonical Juju about it. Again, thank you for your great support.