Having some trouble with a new (PoC, MAAS + Juju) OpenStack installation using the mysql-innodb-cluster charm.
There are three database servers with fixed IPv4 addresses. The hardware RAID on one of them failed, causing the OS to be mounted RO. This caused the charm to elect a new leader, as the failed server was leader at the time. The juju-agent was lost and never recovered. The two other servers lived on. The unit and machine were forcibly removed from Juju, after which the charm went into a blocked state. Of course, a cluster has to have 3 members.
The RAID-card was replaced, the disks were wiped, the server was re-Ready in MAAS and being naĆÆve I thought I could do an āadd-unitā and the server would be added to the cluster, overriding the old server at that address.
The add-unit did deploy, but now the charm is in an error state because the new unit is not in a cluster. Running the action ācluster-statusā on one of the other units reveals that the address is still in use. Ok, so the next step would be removing the instance with that address from the cluster I thought.
Using the action ājuju run-action --wait mysql/1 remove-instance --string-args address=172.30.50.10ā returns the following:
UnitId: mysql/1
id: "86"
message: Remove instance failed
results:
output: |+
Logger: Tried to log to an uninitialized logger.
Traceback (most recent call last):
File "<string>", line 3, in <module>
SystemError: TypeError: Cluster.remove_instance: Option 'force' is expected to be of type Bool, but is Null
traceback: |
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-mysql-1/charm/actions/remove-instance", line 299, in remove_instance
output = instance.remove_instance(address, force=force)
File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 813, in remove_instance
raise e
File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 801, in remove_instance
output = self.run_mysqlsh_script(_script).decode("UTF-8")
File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 1436, in run_mysqlsh_script
return subprocess.check_output(cmd, stderr=subprocess.PIPE)
File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/snap/bin/mysqlsh', '--no-wizard', '--python', '-f', '/root/snap/mysql-shell/common/tmp_tidatm8.py']' returned non-zero exit status 1.
status: failed
A .yaml file with the --params gives the same result for a valid yaml boolean for force (true):
params:
address: ā172.30.50.10ā
force: true
Then I figured I could just do it by hand, but that requires knowing how to connect to the cluster⦠and where the mysql charm stores the password in /var/lib/mysql/mysql.passwd
, I havenāt found an equivalent for this charm. The temporary file tmp_tidatm8.py
is of course, temporary and I canāt check the values used in that.
Iām a bit stuck here. And while my hamfisted forced removal of the unit and machine in Juju canāt have helped Iām looking for a way to restore the cluster rather than do an entire redeploy.
Can anyone give some pointers about what to do better next time, and what to do now?