-
Do you nuke your models when something goes wrong?
Yes. It’s easier to deal with this way.
There is discipline to be learned here though. These days I treat everything as ephemeral. I fat fingered removing a dead controller node last week which left me in a pickle because I had worlds on models that were then inaccessible. Long story short, I bootstrapped a new set of controllers, and because my models are stored in Git as bundles I was able to deploy a replica environment. After which I backed up and restored the data into the replica environment. I then unregistered the old broken controller and blew up the straggling resources left behind by the forgotten controller.
I bet a lot of environments are just too big to do this kind of thing. But the ‘throw away when broken’ mentality has saved me a lot of time recently. -
How do you handle processes of upgrading juju models?
Tried this in the lab and it ended in tears. I’m personally tooscared, I’ll leave the bandage on this wound for another day.
-
How do you manage when something enters into error-states (like you described)?
Either try coax it to where it needs to be - restarting units, restarting services.
Other times poke it with a stick by removing relations and adding them again, or simply scale up, wait until it’s stable and the Shoot The Other Node In The Head (STONITH) -
How do you manage operating system upgrades?
LOL