Introduction
Duration: 0:01
This article is a compilation of CLI tricks, useful for recovery of Juju deployments after services or hosts reboot, crash, or network components become unreachable.
Requirements
There is no specific setup precedure for this tutorial because it contains commands that are applicable in various situations in any Juju deployment. Most of the commands are using just standard CLI utilities like grep
, sed
or awk
.
What to do if you get stuck:
The best place to ask for help is the Charmhub Discourse forum.
If you prefer chatting, visit us on IRC.
Tricks on broken units, machines or models
Duration: N/A
Direct reachability:
For the commands in this section to work, the targeted units or machines must be directly reachable via juju ssh
. If, for example, LXDs are using a network that is reachable only from the LXD host machine, you will encounter connectivity errors.
Restarting units and machines based on their state
To restart units that are in a particular state (e.g. lost
or down
) use the following command and adjust the first part, state=<TARGET_STATE>
, to match the state of the units that you want to restart. The command, as it is, will restart the units in state down
.
state=down; juju status --format oneline | grep "agent:${state}" | egrep -o '[a-z0-9-]+/[0-9]+' | sort -u | xargs -P4 -I@ bash -c 'unit=@; juju ssh $unit "sudo service jujud-unit-${unit/\//-} restart"'
To restart machines in a particular state (including LXDs) use the following command and adjust the first part, state=<TARGET_STATE>
, to match the state of the machines that you want to restart. The command, as it is, will restart the machines in state down
.
state=down; juju machines | sed "1 d" | awk -v state=${state} '{if ($2 == state) print $1}' | xargs -P4 -I@ bash -c 'machine=@; svc=${machine//\//-}; juju ssh ${machine%:} "sudo service jujud-machine-${svc}" restart'
Restarting all units and machines
To restart all units, regardless of their state, run the following command.
juju status --format oneline | egrep -o '[a-z0-9-]+/[0-9]+' | sort -u | xargs -P4 -I@ bash -c 'unit=@; juju ssh $unit "sudo service jujud-unit-${unit/\//-} restart"
To restart all machines, regardless of their state, run the following command.
juju machines | sed "1 d" | awk '{print $1}' | xargs -P4 -I@ bash -c 'machine=@; svc=${machine//\//-}; juju ssh ${machine%:} "sudo service jujud-machine-${svc}" restart'
Retrying failed hooks
If there are multiple units with failed hooks, you can retry to re-run them with juju resolved --all
Juju relations
Duration: N/A
Inspecting relation data
You can use juju show-unit
command to see details about the unitâs relations, including relation data. For example, to inspect relation cni
of the unit kubernetes-master/0
.
juju show-unit kubernetes-master/0 --endpoint cni
Manually setting relation data
To manually set data on a relation, we must first find its relation ID. Lets say we want to change the value of cni-conf-file
on the cni
relation between units kubernetes-worker/0
and flannel/1
. We run this command to determine relation ID.
juju run --unit kubernetes-master/0 'relation-ids cni'
cni:11
Then we can set the new value by running
juju run --unit flannel/1 "relation-set -r cni:11 cni-conf-file=10-flannel.conflist"
Triggering relation-changed hook
The easiest way to trigger relation-changed
hook is to change data in the relation using dummy variable that will be ignored by the other end of the relation. See chapter above, âManually setting relation dataâ to see how to run relation-set
.
Reactive charms: For the reactive charms, use âjuju run --unit foo/0 charms.reactiveâ to set/unset the flags and wait for the update-status hook to run
Debugging
Duration: N/A
Running juju debug-hooks
on Juju backed by Kubernetes
Current stable Juju release (2.8) does not support running juju debug-hooks
command on model/units backed by Kubernetes. This feature will become available in 2.9 release (see Release Notes). You can, however, achieve similar debugging capabilities using the following trick.
Letâs say we have a gunicorn
unit with db-relation-joined
hook failing and we want to find out whatâs causing the failure.
Placeholders:
In the following commands, we will use <JUJU_MODEL>
and <JUJU_UNIT>
as placeholders for actual unit and model names, replace them with approprate values.
Get a shell on the operator pod using microk8s.kubectl
:
microk8s.kubectl exec -it -n <JUJU_MODEL> pod/<JUJU_UNIT>-operator-0 bash
Replace faulty hook with a placeholder script that will dump hookâs environment variables. Note the sleep
in this script, it will give us one hour to do our debugging. If you suspect youâll need longer, put in appropriate value. Donât be afraid to go big, we will kill the sleep when we are done debugging.
cd agents/unit-<JUJU_UNIT>/charm/hooks
mv db-relation-joined db-relation-joined.old
cat > db-relation-joined <<EOF
#!/bin/bash
env
sleep 3600
exit 1
EOF
chmod 755 db-relation-joined
Log out of the operator pod and run juju resolved
, this will re-run failed hook that is now replaced with our placeholder script and we will see all the dumped environment variables in the juju debug-log
juju resolved <JUJU_UNIT>
juju debug-log
Example of the output in juju debug-log
is something like this:
application-gunicorn: 13:30:16 DEBUG unit.gunicorn/19.db-relation-joined JUJU_UNIT_NAME=gunicorn/19
application-gunicorn: 13:30:16 DEBUG unit.gunicorn/19.db-relation-joined JUJU_AGENT_SOCKET_ADDRESS=@/var/lib/juju/agents/unit-gunicorn-19/agent.socket
Copy every line thatâs produced from db-relation-joined
hook. Log back into the unit and export the dumped variables. This will simulate environment in which the juju hooks are typicaly running with all the necessary variables set.
microk8s.kubectl exec -it -n <JUJU_MODEL> pod/<JUJU_UNIT>-operator-0 bash
for i in $(cat|awk '{print $5}'); do export $i; done
[paste previously copied lines from the debug-log]
^d
cd $PWD
Now recreate a proper hook file and run it to debug it.
mkdir hooks/tmp/
ln -s ../../src/charm.py hooks/tmp/$JUJU_HOOK_NAME
hooks/tmp/$JUJU_HOOK_NAME
Output of the hook can look like this:
Traceback (most recent call last):
File "lib/ops/model.py", line 697, in _run
result = run(args, check=True, **kwargs)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '('relation-set', '-r', '1', 'database=mydbname', '--app=True')' returned non-zero exit status 1.
When you are done with debugging, kill the sleep from our placeholder script.
pkill -x sleep
Inspecting state of reactive charms
Itâs real simple to check states of the reactive charm. States (or flags) are used by the reactive framework to transition the components of a unit (relations, configurations, âŠ) from a not-ready to a ready status. Inspecting them can help you troubleshoot a possible bug in the charm. In the example below we will inspect states of the unit flannel/1
juju run --unit flannel/1 'charms.reactive -p get_states'
{'cni.configured': None,
'cni.connected': None,
'cni.is-master': None,
'endpoint.cni.changed.cni-conf-file': None,
'endpoint.cni.changed.egress-subnets': None,
'endpoint.cni.changed.ingress-address': None,
'endpoint.cni.changed.is_master': None,
'endpoint.cni.changed.private-address': None,
'endpoint.cni.joined': None,
'etcd.available': {'conversations': ['reactive.conversations.etcd.global'],
'relation': 'etcd'},
'etcd.connected': {'conversations': ['reactive.conversations.etcd.global'],
'relation': 'etcd'},
'etcd.tls.available': {'conversations': ['reactive.conversations.etcd.global'],
'relation': 'etcd'},
'flannel.binaries.installed': None,
'flannel.cni.available': None,
'flannel.etcd.credentials.installed': None,
'flannel.network.configured': None,
'flannel.service.installed': None,
'flannel.service.started': None,
'flannel.version.set': None}
Accessing Juju internal database
Duration: N/A
Caution: Changing values in the Juju database can corrupt your Juju installation. Proceed with caution and doublecheck before making any changes.
Sometimes it may be necessary to access Jujuâs database directly, although it should be considered as last resort solution. To log into the Jujuâs database, you must first log into the Juju controller machine.
juju ssh -m controller 0
Once there, you can log into the Jujuâs mongo database.
mongo --sslAllowInvalidCertificates --authenticationDatabase admin --ssl -u $(sudo awk '/tag/ {print $2}' /var/lib/juju/agents/machine-?/agent.conf) -p $(sudo awk '/statepassword/ {print $2}' /var/lib/juju/agents/machine-?/agent.conf) localhost:37017/juju
Manual change of OpenStack endpoint certificates
In case you are running a Juju controller on top of OpenStack and for some reason the CA certificates on the endpoints change, you canât access the endpoint anymore with âcertificate signed by unknown authorityâ, the only way to fix it is to update Jujuâs mongodb directly.
Start by logging into the controller node and then into the mongo interactive shell. Once you are in the mongo shell, update the certificate field for your cloud in the clouds
collection.
juju:PRIMARY> db.clouds.update({"_id": "openstack_cloud"}, {$set: {"ca-certificates": [""]}})
Certificate must be PEM-formatted one-liner with newlines replaced by \n
.
-----BEGIN CERTIFICATE-----\n<CERTIFICATE_CONTENT>\n\n-----END CERTIFICATE-----\n
Debugging stuck transactions
If a transaction is stuck and you want to figure out whatâs going on, look at the âcleanupsâ collection.
juju:PRIMARY> db.cleanups.find().pretty()
Dumping database content
Although juju create-backup
exists, you can dump raw Juju database by logging into the controller and running the following:
datestamp=`date +"%Y-%m-%d_%H-%M-%S"`
conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
mongodump -h 127.0.0.1:37017 --username $user --password $password --authenticationDatabase admin --db juju -o mongodump --ssl --sslAllowInvalidCertificates
tar -zcf "juju-mongodump-${datestamp}.tar.gz" mongodump