Controllers missing after upgrade-controller

jameinel · 17 September 2020 15:54

The upgrade failure happens when you have done a deploy or upgrade that failed to connect to the charm store, and leaves behind a “placeholder” record, which has a charm definition, but no “meta” information.

To fix the records in the controllers and avoid the nil pointer panic, you can do:

For all controller machines, SSH into them and run:

systemctl stop jujud-machine-*

Which should stop the controllers from trying to run the upgrade steps while we are updating the database.

Get access to Mongo:

agent=$(cd /var/lib/juju/agents; echo machine-*)
pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju

If you are in an HA controller, you will want to determine which machine is the Mongo Primary (it will have a prompt of:

juju:PRIMARY>

If you are not, the prompt should be:

juju:SECONDARY>

You can also use

rs.status()

And look for the

"members": [

with “stateStr” of “PRIMARY”, eg:

   "name" : "10.5.24.54:37017",
   "health" : 1,
   "state" : 1,
   "stateStr" : "PRIMARY",

From there you can run:

db.charms.find({meta: null}).count()

And see how many records should be affected. You can exit mongo to run:

mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.find({}).pretty()' > all_records.txt

To get a complete list of all charm records in the ‘all_records.txt’ file. And

mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.find({"meta": null}).pretty()' > null_records.txt

To get just the records that have a ‘null’ meta field.

And then

mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju --eval 'db.charms.update({meta: null}, { $set: {"meta": {}} }, false, true)'

Which will update the records with a nil meta to one with an empty meta, avoiding the nil pointer dereference.
You should see a line like:

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Where the nModified matches the count() from earlier.

Once you have run the database updates, you can then do:

systemctl start jujud-machine-*

On all of the controllers and it should do the upgrade and progress as normal.

We have tested this workaround on 2 different controllers, and have also seen that it still allows users to issue an “juju upgrade-charm” for one of the previously-failed upgrades.