Migrating models

Model migration is the movement of a model from one controller to another. The same configuration of machines, units, and their relations will be replicated on the destination controller, while your applications continue uninterrupted. Note that a controller model cannot be migrated.

Model migrations can be used to simulate a controller upgrade where models are migrated to a newly-created controller (running with a more recent Juju version).

Migration is equally useful for load balancing. If a controller hosting multiple models reaches capacity, you can move the busiest models to a new controller, reducing load without affecting your applications.

For migration to work:

  • The source and destination controllers need to be in the same cloud environment.
  • The destination controller needs to be running on the same cloud substrate as the source controller.
  • Destination controllers on different regions or VPCs need direct connectivity to the source controller.
  • The version of Juju running on the destination controller needs to be the same or newer than the version on the source controller.
  • A model intended to be migrated must have all of its users set up on the destination controller. The operation will be aborted, and an advisory message displayed, if this is not the case.

Usage

To start a migration, the destination controller must be in the client’s local configuration cache. See the Clouds page for details on how to do this.

While the migration process is robust, a backup of the source controller before performing a migration is recommended. See Controller backups for assistance.

To migrate a model on the current controller to a destination controller:

juju migrate <model-name> <destination-controller>

A model with the same name as the migrated model cannot exist on the destination controller

You can monitor progress from the output of the status command run against the source model. You may want to use a command such as watch to automatically refresh the status output, rather than manually running status each time:

watch --color -n 1 juju status --color

In the output, a ‘Notes’ column is appended to the model overview line at the top of the output. The migration will step through various states, from ‘starting’ to ‘successful’.

The ‘status’ section in the output from the show-model command also includes details on the current or most recently run migration. It adds extra information too, such as the migration start time, and is a good place to start if you need to determine why a migration has failed.

This section will look similar to the following after starting a migration:

  status:
    current: available
    since: 23 hours ago
    migration: uploading model binaries into destination controller
    migration-start: 21 seconds ago

Migration time depends on the complexity of the model, the resources it uses, and the capabilities of the backing cloud.

If failure occurs during the migration process, the model, in its original state, will be reverted to the original controller.

Verification

When the migration has completed successfully, the model will no longer reside on the source controller. It, and its applications, machines and units, will be running on the destination controller.

Inspect the migrated model with the status command:

juju status -m <destination-controller>:<model>

On v.2.6.0 (both client and source controller) if the model is accessed using the old/source controller the operator is guided to the new controller:

juju status -m <source-controller>:<model>

In this case, the following responses are possible:

a) if the controller is known to the client:

ERROR Model "migrate" has been migrated to controller "dst".
To access it run 'juju switch dst:migrate'.

b) if the controller is unknown to the client:

ERROR Model "migrate" has been migrated to another controller.
To access it run one of the following commands (you can replace the -c argument with your own preferred controller name):
 'juju login 10.65.47.40:17070 -c dst'

New controller fingerprint [93:73:88:C9:9A:AA:EC:C8:85:AE:D1:33:E5:92:CE:95:0F:B6:00:82:21:CB:8C:A0:42:16:29:77:CF:6D:B6:D4]

See Controller logins for background information.

I would like to migrate my juju controller from the physical machine its currently running on into a LXD container running on my MAAS controller, which I have already successfully created complete with bridged network interface. I have been told the best way to achieve this is by migrating the juju model.

I’ve not attempted the migration yet as I know I’ve not completed the pre-requisites, two of which I don’t feel are explained very clearly currently, namely:

“For migration to work:

A model intended to be migrated must have all of its users set up on the destination controller. The operation will be aborted, and an advisory message displayed, if this is not the case.”

Maybe it will explain this better when I run the migration command but is this referring to Linux/Ubuntu or Juju users or both?

The real problem I have currently is:

“To start a migration, the destination controller must be in the client’s local configuration cache. See the Clouds page for details on how to do this.”

Clients local configuration cache? What client, which cache? I presume its means the Juju cache. It gives a link to the Clouds page but the only command on that page relating to (Juju) cache is this:

juju update-cloud --local oracle -f oracle.yaml

How do I add my new, destination controller to the Juju cache on the source machine?

Thanks

To answer one of my questions it seems the way to add a new controller into the juju cache is to use juju register. It should state this in the doc instead of linking to the clouds page.

I believe documentation and some reference numbers around tuning agent-ratelimit-max and agent-ratelimit-rate would be beneficial as these model migrations hit issues when the model are large. Running the model migration without modifying these settings will lead to a failed migration due to timeout of agents as they all try to connect to the target controller.

Currently we have an HA Xenial juju 2.8.9 controller cluster and are upgrading to an HA Bionic 2.8.9 controller cluster. Each controller has 8 CPUs and 28 GBs of RAM. We’ve been trying with the different scenarios in our lab but currently what works for us is - On 167 units + 56 machines or containers, we set ratelimit-max=500 and ratelimite-rate=10s.

We are making two assumptions.

  1. The total agent count = number of units + subordinate units + # of machines

  2. Each Juju agent has 2 connections to the controller

  3. Our controllers at 8 GBs of RAM are a possible reason for the failures we are hitting and increasing the size might help.

Our next test will look at setting the agent-ratelimit-rate to 250ms (default) and seeing if the migration succeeds.

We appreciate your help and guidance on this issue.

The migration was successful with 250ms and the model stabilized in half the time compared to setting ratelimit-rate to 10s.

The agent rate limiting uses a ‘token bucket’ style, which is:

  • I have a bucket that can hold tokens
  • I hand one out any time an agent connects.
  • I have a maximum number of tokens I can hold in my bucket, (agent-ratelimit-max)
  • And I have a rate that I refill the bucket (1 token every agent-ratelimit-rate)

So the ratelimit-max sets what the peak load on the system can be (assuming things have been idle), and the ratelimit-rate sets how quickly you let that continue to happen.

I actually think you would want a relatively low agent-ratelimit-max, but allow for a higher agent-ratelimit-rate. So something like

agent-ratelimit-max: 100

agent-ratelimit-rate: 50ms

That allows 50 agents to connect (2 connections per agent) immediately, and then slows them down to 10 more agents per second.

Setting the max to 500 means that all of them will try immediately (iow, disable throttling for first attempt). If any of them fail, they would then be throttled to only 4 per second.

If you are able to handle the load of 500 concurrently, then you could turn up the ratelimit-rate even faster (10ms would be 100 connections / sec), and that would still give a better overall experience (i would guess).