How to troubleshoot a model upgrade

See also:

This section provides strategies and techniques to assist with broken model upgrades. See the main Upgrading models page for background information.

Case #1 - An agent does not restart (config)

It may occur that an agent does not restart upon upgrade. One thing that may help is the inspection and modification of its agent.conf file. Comparing it with its file before upgrading can be very useful.

Installing a different or modified configuration file will require a restart of the daemon. For example, for a machine with an ID of ‘2’:

juju ssh 2 'ls -lh /etc/systemd/system/juju*'

This will return something similar to:

-rwxr-xr-x 1 root root 326 Jun 29 19:02 /etc/systemd/system/jujud-machine-2-exec-start.sh
-rw-r--r-- 1 root root 284 Jun 29 19:02 /etc/systemd/system/jujud-machine-2.service

Therefore, if the agent for machine ‘2’ is not coming up you can connect to the machine in this way:

juju ssh 2

Then modify or restore the agent file (/var/lib/juju/agents/machine-2/agent.conf), and while still connected to the machine, restart the agent:

sudo systemctl restart jujud-machine-2

Case #2 - An agent does not restart (hook)

The restart of an agent, due to invoking upgrade-model or by manual means (as above) may cause a hook for that particular unit/machine to be called. That can sometimes lead to hook failures. Connect to that unit using the debug-hooks command, see what is wrong, and retry the hook using the resolved command:

juju debug-hooks etcd/2

In a different terminal retry the failed hook:

juju resolved etcd/2

See more: How to debug a charm > Debug a single failing hook

Case #3 - An agent is too old

When the running agent software that is more than 1 patch point behind the targeted upgrade version the upgrade process will abort.

One very common reason for “agent version skew” is that during a previous upgrade the agent could not be contacted and, therefore, was not upgraded along with the rest of the agents.

For example, the following error message will be printed when attempting to upgrade from 2.2.1 to 2.2.2 when an agent is still running, say, 2.2.0:

ERROR some agents have not upgraded to the current model version 2.2.1:
machine-0, unit-ubuntu-0

To overcome this situation you may force the upgrade by ignoring the agent version check:

juju upgrade-model --ignore-agent-versions

Case #4 - Dealing with an upgrade failure

If an attempted upgrade results in failure it may prove difficult to return to a working setup and you may be compelled to start anew. Doing so will make the old controller completely inert and you should consider it a data loss situation.

Begin by removing the controller with juju destroy-controller or juju kill-controller. If this is insufficient you may need to ask Juju to simply “forget” about the controller. This is done with the unregister command.

Once the above is completed, a new controller can be created.

What is the solution for “Case #3” ? How can I force the upgrade of an agent that for some reason has been left behind ? The same also applies for charm-units.

Links to Upgrading models and Debugging charm hooks are broken.

I think they should be Upgrading models and Debug a single failing hook

Fixed, thanks!