The upgrade failure happens when you have done a deploy or upgrade that failed to connect to the charm store, and leaves behind a “placeholder” record, which has a charm definition, but no “meta” information.
To fix the records in the controllers and avoid the nil pointer panic, you can do:
For all controller machines, SSH into them and run:
systemctl stop jujud-machine-*
Which should stop the controllers from trying to run the upgrade steps while we are updating the database.
Get access to Mongo:
agent=$(cd /var/lib/juju/agents; echo machine-*)
pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju
If you are in an HA controller, you will want to determine which machine is the Mongo Primary (it will have a prompt of:
juju:PRIMARY>
If you are not, the prompt should be:
juju:SECONDARY>
You can also use
rs.status()
And look for the
"members": [
with “stateStr” of “PRIMARY”, eg:
"name" : "10.5.24.54:37017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
From there you can run:
db.charms.find({meta: null}).count()
And see how many records should be affected. You can exit mongo to run:
mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.find({}).pretty()' > all_records.txt
To get a complete list of all charm records in the ‘all_records.txt’ file. And
mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.find({"meta": null}).pretty()' > null_records.txt
To get just the records that have a ‘null’ meta field.
And then
mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.update({meta: null}, { $set: {"meta": {}} }, false, true)'
Which will update the records with a nil meta to one with an empty meta, avoiding the nil pointer dereference.
You should see a line like:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
Where the nModified matches the count() from earlier.
Once you have run the database updates, you can then do:
systemctl start jujud-machine-*
On all of the controllers and it should do the upgrade and progress as normal.
We have tested this workaround on 2 different controllers, and have also seen that it still allows users to issue an “juju upgrade-charm” for one of the previously-failed upgrades.