Failed to connect to MySQL but vault is unsealed and mysql is ready (openstack base)

After a power outage and using
juju run-action --wait mysql-innodb-cluster/leader reboot-cluster-from-complete-outage
and unsealing the vault, some units are still unable to connect mysl-innodb-cluster.

cinder-mysql-router/0*     blocked   idle            192.168.221.20                     Failed to connect to MySQL
glance-mysql-router/0*     blocked   idle            192.168.221.25                     Failed to connect to MySQL
neutron-mysql-router/0*    blocked   idle            192.168.221.23                     Failed to connect to MySQL
dashboard-mysql-router/0*  blocked   idle            192.168.221.21                     Failed to connect to MySQL
placement-mysql-router/0*  blocked   idle            192.168.221.26                     Failed to connect to MySQL
mysql-innodb-cluster/0*      active    idle   0/lxd/3  192.168.221.99                     Unit is ready: Mode: R/O
mysql-innodb-cluster/1       active    idle   1/lxd/2  192.168.221.5                      Unit is ready: Mode: R/W
mysql-innodb-cluster/2       active    idle   2/lxd/2  192.168.221.8                      Unit is ready: Mode: R/O
vault/0*                     active    idle   0/lxd/6  192.168.221.11  8200/tcp           Unit is ready (active: true, mlock: disabled)
  vault-mysql-router/0*      active    idle            192.168.221.11                     Unit is ready

I am attaching juju status output

jujustatus.txt.pdf (22.8 KB)

I’ve never seen this situation before. Keystone and Vault were able to connect to the database so it appears that the MySQL side is OK and listening. Is there any special routing/firewalls going on that may have been affected by the power event?

You can check the logs:

juju debug-log --replay --no-tail --include glance-mysql-router/0
juju debug-log --replay --no-tail --include mysql-innodb-cluster/0

As a last resort you can try removing and re-adding the mysql-router relations. For example (use --force if it doesn’t go away):

juju remove-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router

Wait for it to be gone. Then:

juju add-relation glance-mysql-router:db-router mysql-innodb-cluster:db-router

You may want to tail the MySQL server logs while you’re doing the above:

juju debug-log --include mysql-innodb-cluster/0

Thanks for your reply, there are no firewalls or routing rules, and dashboard cinder lives in the same machine but dashboard is working and cinder is not.
I have all ready recovered placement, and dashboard by rebooting the units but , cinder, glance, neutron an nova are still missing even after rebooting the units.
I am attaching the output from juju status and the two juju debug-log you suggested, I think the problem is here:

unit-nova-mysql-router-0: 11:44:56 ERROR unit.nova-mysql-router/0.juju-log Unable to find implementation for relation: requires of juju-info
unit-nova-mysql-router-0: 11:44:57 ERROR unit.nova-mysql-router/0.juju-log Failed to connect to database due to ‘(2003, “Can’t connect to MySQL server on ‘127.0.0.1:3306’ (111)”)’
unit-glance-mysql-router-0: 11:44:57 ERROR unit.glance-mysql-router/0.juju-log Unable to find implementation for relation: requires of juju-info
unit-glance-mysql-router-0: 11:44:58 ERROR unit.glance-mysql-router/0.juju-log Failed to connect to database due to ‘(2003, “Can’t connect to MySQL server on ‘127.0.0.1:3306’ (111)”)’
unit-ntp-1: 11:45:45 ERROR unit.ntp/1.juju-log Unable to find implementation for relation: requires of ntp
unit-ntp-1: 11:45:45 ERROR unit.ntp/1.juju-log Unable to find implementation for relation: provides of ntp
unit-neutron-mysql-router-0: 11:45:56 ERROR unit.neutron-mysql-router/0.juju-log Unable to find implementation for relation: requires of juju-info
unit-neutron-mysql-router-0: 11:45:57 ERROR unit.neutron-mysql-router/0.juju-log Failed to connect to database due to ‘(2003, “Can’t connect to MySQL server on ‘127.0.0.1:3306’ (111)”)’
unit-keystone-mysql-router-0: 11:45:58 ERROR unit.keystone-mysql-router/0.juju-log Unable to find implementation for relation: requires of juju-info
unit-neutron-api-plugin-ovn-0: 11:45:58 ERROR unit.neutron-api-plugin-ovn/0.juju-log Unable to find implementation for relation: requires of juju-info

Regards
Mario

glance.txt__F-job_154.pdf (168.3 KB) jujustatus.txt__able-job_153.pdf (26.4 KB) mysql-innodb.txt__le-job_152.pdf (330.7 KB)

This appears to be an actual MySQL issue so we need to get logs from that service - not just Juju logs. Also, since the database is refusing connections it would be best to troubleshoot/test connection issues. Here is an upstream resource.

should I just login into a unit of the mysql cluster and get the logs?

Yeah.

Here is another upstream resource I found.

the file /var/log/mysql/error.log is empty in mysql-innodb-cluster/* units

Is it possible that the logs have rotated and that you should be looking at the previous file (error.log.1.gz)?

in mysql-innodb-cluster/0 : /var/log/mysql there are only 2 files error.log error.log.1 both are empty

Y tried removing and adding the relation and now I got

cinder-mysql-router/0* waiting idle 192.168.221.20 ‘db-router’ incomplete, Waiting for proxied DB creation from cluster

Since you tried a Juju operation perhaps there is more information in the unit logs?

Maybe the underlying service databases (e.g. for Cinder) experienced some trouble during the outage. Try performing simple SQL queries on the databases.

Hi Mario, did you resolve this?

I have found I can connect through mysql client from machines that are allowed from mysql using password got from the leader:

juju run --unit mysql-innodb-cluster/leader leader-get

doing juju ssh into mysql route machine I was able to connect to mysql:

mysql -h 10.0.101.16 -u mysqlrouteruser -p

So passwords seem to be working but I don’t know how to resolve the connection issue:

2022-03-09 15:58:59 ERROR unit.cinder-mysql-router/0.juju-log server.go:327 Failed to connect to database due to ‘(2003, “Can’t connect to MySQL server on ‘127.0.0.1:3306’ (111)”)’

Juan, I did not solve it, I killed that cloud and started agian :frowning:

ah so sad, I killed a lot of clouds before as well, outage event in openstack charmed is very hard to recover, it is really painful and should be easy.

BTW i was able to recover by removing/adding relations but one of the mysql-innodb-cluster unit is not able to join the cluster and I’m getting:

mysqlsh.Error: Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)

And I’m currently struggling with.

I am now using a cloud with High Availability , hopefully I wont have problems this time (fingers crossed)

I have the same problem. How to resolve it withou reinstalling Openstack?

1 Like

I get the same problem although I did destroy model, rebuil model, reploy application many time. It is still FAILED. I am stuck on deploying VAULT, get error: Vault service not running… and Failed to connect to MySQL. I ssd to VAULT container and get the log: 2022-05-17 10:45:16 INFO unit.vault-mysql-router/1.juju-log server.go:327 Writing /var/lib/mysql/vault-mysql-router/mysqlrouter.conf 2022-05-17 10:45:16 ERROR unit.vault-mysql-router/1.juju-log server.go:327 Failed to connect to database due to ‘(2003, “Can’t connect to MySQL server on ‘127.0.0.1:3306’ (111)”)’ 2022-05-17 10:45:17 INFO juju.worker.uniter.operation runhook.go:152 ran “update-status” hook (via explicit, bespoke hook script) Have any idea for this issue? I am using the guidance in the link: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/install-openstack.html Thanks