Inconsistent cluster status after removing, adding nodes, or change leadership

I am having:

  • juju: 2.9.51-ubuntu-amd64
  • OS: Ubuntu 20.04.2

Recently, my Juju cluster got some nodes down which lead to mysql-innodb-cluster got down also.

I did try to use ‘reboot-cluster-from-complete-outage’, but there were two nodes cannot join cluster because they had some old nodes info with old IPs.

Then tried to add new nodes (mysql-innodb-cluster/10 and mysql-innodb-cluster/11) by using mysqlsh tool command ‘cluster.add_instance’.

Now I have two new nodes mysql-innodb-cluster/10 (10.0.32.222) and mysql-innodb-cluster/11 (10.0.32.223), and the old node mysql-innodb-cluster/7 (10.0.32.37).

Now I can get get cluster status from directly mysqlsh tool:

mysql-py []> dba.get_cluster().status()
{
"clusterName": "jujuCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "10.0.32.37:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"10.0.32.222:3306": {
"address": "10.0.32.222:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.41"
},
"10.0.32.223:3306": {
"address": "10.0.32.223:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.41"
},
"10.0.32.37:3306": {
"address": "10.0.32.37:3306",
"mode": "R/W",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.41"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "10.0.32.37:3306"
}

But I cannot get the cluster status from juju (it seems juju getting status from wrong leader):

juju run-action mysql-innodb-cluster/leader cluster-status --wait
unit-mysql-innodb-cluster-10:
  UnitId: mysql-innodb-cluster/10
  id: "1046"
  results:
cluster-status: "null"
  status: completed
  timing:
completed: 2025-02-12 06:56:06 +0000 UTC
enqueued: 2025-02-12 06:56:05 +0000 UTC
started: 2025-02-12 06:56:06 +0000 UTC

This is the ‘performance_schema.replication_group_members’ from each node:

Node mysql-innodb-cluster/7
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 8db5a1b1-af5b-11ec-b6ea-246e96415a88 | 10.0.32.37 | 3306 | ONLINE | PRIMARY | 8.0.41 | XCom |
| group_replication_applier | cb3fc113-e734-11ef-a893-00163efb4bc5 | 10.0.32.223 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
| group_replication_applier | d3367c91-e728-11ef-83c6-00163e690c42 | 10.0.32.222 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+

Node mysql-innodb-cluster/10
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 8db5a1b1-af5b-11ec-b6ea-246e96415a88 | 10.0.32.37 | 3306 | ONLINE | PRIMARY | 8.0.41 | XCom |
| group_replication_applier | cb3fc113-e734-11ef-a893-00163efb4bc5 | 10.0.32.223 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
| group_replication_applier | d3367c91-e728-11ef-83c6-00163e690c42 | 10.0.32.222 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+

Node mysql-innodb-cluster/11
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 8db5a1b1-af5b-11ec-b6ea-246e96415a88 | 10.0.32.37 | 3306 | ONLINE | PRIMARY | 8.0.41 | XCom |
| group_replication_applier | cb3fc113-e734-11ef-a893-00163efb4bc5 | 10.0.32.223 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
| group_replication_applier | d3367c91-e728-11ef-83c6-00163e690c42 | 10.0.32.222 | 3306 | ONLINE | SECONDARY | 8.0.41 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+

The juju status:

    Model      Controller       Cloud/Region    Version  SLA          Timestamp
openstack  maas-controller  mymaas/default  2.9.5    unsupported  07:30:08Z

App                     Version          Status   Scale  Charm                   Channel     Rev  Exposed  Message
ceph-mon                15.2.17          active       3  ceph-mon                stable       53  no       Unit is ready and clustered
ceph-osd                15.2.17          active       3  ceph-osd                stable      308  no       Unit is ready (1 OSD)
ceph-radosgw            15.2.17          active       1  ceph-radosgw            stable      294  no       Unit is ready
cinder                  17.1.0           active       1  cinder                  stable      308  no       Unit is ready
cinder-ceph             17.1.0           active       1  cinder-ceph             stable      260  no       Unit is ready
cinder-mysql-router     8.0.41           blocked      1  mysql-router            stable        6  no       'db-router' missing
dashboard-mysql-router  8.0.41           blocked      1  mysql-router            stable        6  no       'db-router' missing
glance                  21.1.0           active       1  glance                  stable      305  no       Unit is ready
glance-mysql-router     8.0.41           blocked      1  mysql-router            8.0/stable   26  no       'db-router' missing
keystone                18.1.0+git20...  active       1  keystone                stable      321  no       Application Ready
keystone-mysql-router   8.0.41           blocked      1  mysql-router            stable        6  no       'db-router' missing
mysql-innodb-cluster    8.0.41           waiting    1/3  mysql-innodb-cluster    stable        5  no       Instance not yet in the cluster
neutron-api             17.4.1           blocked      1  neutron-api             stable      292  no       Services not running that should be: apache2
neutron-api-plugin-ovn  17.4.1           waiting      1  neutron-api-plugin-ovn  stable        4  no       'certificates' awaiting server certificate data
neutron-mysql-router    8.0.41           blocked      1  mysql-router            stable        6  no       'db-router' missing
nova-cloud-controller   22.4.0           active       1  nova-cloud-controller   stable      352  no       Unit is ready
nova-compute            22.2.0           active       4  nova-compute            stable      325  no       Unit is ready
nova-mysql-router       8.0.41           blocked      1  mysql-router            8.0/stable  257  no       'db-router' missing
ntp                     3.5              active       4  ntp                     stable       44  no       chrony: Ready
openstack-dashboard     18.6.2           active       1  openstack-dashboard     stable      311  no       Unit is ready
ovn-central             20.03.2          waiting      3  ovn-central             stable        5  no       'certificates' awaiting server certificate data
ovn-chassis             20.03.2          waiting      4  ovn-chassis             stable       10  no       'certificates' awaiting server certificate data
placement               3.0.1            blocked      1  placement               stable       32  no       Services not running that should be: apache2
placement-mysql-router  8.0.41           blocked      1  mysql-router            8.0/stable   26  no       'db-router' missing
rabbitmq-server         3.8.3            active       1  rabbitmq-server         stable      108  no       Unit is ready
vault                                    waiting      1  vault                   stable       44  no       'shared-db' incomplete
vault-mysql-router      8.0.41           waiting      1  mysql-router            stable       26  no       'db-router' incomplete, MySQL Router not yet bootstrapped

Unit                          Workload  Agent   Machine   Public address  Ports              Message
ceph-mon/0*                   active    idle    3/lxd/0   10.0.32.215                        Unit is ready and clustered
ceph-mon/1                    active    idle    4/lxd/0   10.0.32.216                        Unit is ready and clustered
ceph-mon/2                    active    idle    5/lxd/0   10.0.32.217                        Unit is ready and clustered
ceph-osd/0                    active    idle    3         10.0.32.38                         Unit is ready (1 OSD)
ceph-osd/1                    active    idle    4         10.0.32.39                         Unit is ready (1 OSD)
ceph-osd/2*                   active    idle    5         10.0.32.40                         Unit is ready (1 OSD)
ceph-radosgw/0*               active    idle    3/lxd/1   10.0.32.214     80/tcp             Unit is ready
cinder/0*                     active    idle    1/lxd/0   10.0.32.208     8776/tcp           Unit is ready
  cinder-ceph/0*              active    idle              10.0.32.208                        Unit is ready
  cinder-mysql-router/0*      blocked   idle              10.0.32.208                        'db-router' missing
glance/11*                    active    idle    1/lxd/7   10.0.32.24      9292/tcp           Unit is ready
  glance-mysql-router/25*     blocked   idle              10.0.32.24                         'db-router' missing
keystone/2*                   error     idle    0/lxd/25  10.0.32.220     5000/tcp           hook failed: "identity-service-relation-changed"
  keystone-mysql-router/2*    blocked   idle              10.0.32.220                        'db-router' missing
mysql-innodb-cluster/7        active    failed  14        10.0.32.37                         Unit is ready: Mode: R/O
mysql-innodb-cluster/10*      waiting   idle    0/lxd/29  10.0.32.222                        Instance not yet in the cluster
mysql-innodb-cluster/11       waiting   failed  1/lxd/16  10.0.32.223                        Instance not yet in the cluster
neutron-api/13*               blocked   idle    1/lxd/11  10.0.32.26      9696/tcp           Services not running that should be: apache2
  neutron-api-plugin-ovn/2*   waiting   idle              10.0.32.26                         'certificates' awaiting server certificate data
  neutron-mysql-router/2*     blocked   idle              10.0.32.26                         'db-router' missing
nova-cloud-controller/8*      active    idle    0/lxd/27  10.0.32.253     8774/tcp,8775/tcp  Unit is ready
  nova-mysql-router/43*       blocked   idle              10.0.32.253                        'db-router' missing
nova-compute/0                active    idle    6         10.0.32.34                         Unit is ready
  ntp/3*                      active    idle              10.0.32.34      123/udp            chrony: Ready
  ovn-chassis/3               waiting   idle              10.0.32.34                         'certificates' awaiting server certificate data
nova-compute/1                active    idle    7         10.0.32.31                         Unit is ready
  ntp/0                       active    idle              10.0.32.31      123/udp            chrony: Ready
  ovn-chassis/0               waiting   idle              10.0.32.31                         'certificates' awaiting server certificate data
nova-compute/2                active    idle    8         10.0.32.33                         Unit is ready
  ntp/1                       active    idle              10.0.32.33      123/udp            chrony: Ready
  ovn-chassis/1*              waiting   idle              10.0.32.33                         'certificates' awaiting server certificate data
nova-compute/3*               active    idle    9         10.0.32.32                         Unit is ready
  ntp/2                       active    idle              10.0.32.32      123/udp            chrony: Ready
  ovn-chassis/2               waiting   idle              10.0.32.32                         'certificates' awaiting server certificate data
openstack-dashboard/0*        active    idle    1/lxd/3   10.0.32.206     80/tcp,443/tcp     Unit is ready
  dashboard-mysql-router/0*   blocked   idle              10.0.32.206                        'db-router' missing
ovn-central/0*                waiting   idle    0/lxd/3   10.0.32.213     6641/tcp,6642/tcp  'certificates' awaiting server certificate data
ovn-central/1                 waiting   idle    1/lxd/4   10.0.32.207     6641/tcp,6642/tcp  'certificates' awaiting server certificate data
ovn-central/3                 waiting   idle    14        10.0.32.37      6641/tcp,6642/tcp  'certificates' awaiting server certificate data
placement/7*                  blocked   idle    0/lxd/10  10.0.32.23      8778/tcp           Services not running that should be: apache2
  placement-mysql-router/14*  blocked   idle              10.0.32.23                         'db-router' missing
rabbitmq-server/2*            active    idle    14        10.0.32.37      5672/tcp           Unit is ready
vault/24*                     waiting   idle    0/lxd/35  10.0.32.219                        'shared-db' incomplete
  vault-mysql-router/31*      waiting   idle              10.0.32.219                        'db-router' incomplete, MySQL Router not yet bootstrapped

Machine   State    Address      Inst id               Series  AZ       Message
0         started  10.0.32.35   controller01          focal   default  Deployed
0/lxd/3   started  10.0.32.213  juju-21d1b4-0-lxd-3   focal   default  Container started
0/lxd/10  started  10.0.32.23   juju-21d1b4-0-lxd-10  focal   default  Container started
0/lxd/25  started  10.0.32.220  juju-21d1b4-0-lxd-25  focal   default  Container started
0/lxd/27  started  10.0.32.253  juju-21d1b4-0-lxd-27  focal   default  Container started
0/lxd/29  started  10.0.32.222  juju-21d1b4-0-lxd-29  focal   default  Container started
0/lxd/35  started  10.0.32.219  juju-21d1b4-0-lxd-35  focal   default  Container started
1         started  10.0.32.36   controller02          focal   default  Deployed
1/lxd/0   started  10.0.32.208  juju-21d1b4-1-lxd-0   focal   default  Container started
1/lxd/3   started  10.0.32.206  juju-21d1b4-1-lxd-3   focal   default  Container started
1/lxd/4   started  10.0.32.207  juju-21d1b4-1-lxd-4   focal   default  Container started
1/lxd/7   started  10.0.32.24   juju-21d1b4-1-lxd-7   focal   default  Container started
1/lxd/11  started  10.0.32.26   juju-21d1b4-1-lxd-11  focal   default  Container started
1/lxd/16  started  10.0.32.223  juju-21d1b4-1-lxd-16  focal   default  Container started
3         started  10.0.32.38   storage01             focal   default  Deployed
3/lxd/0   started  10.0.32.215  juju-21d1b4-3-lxd-0   focal   default  Container started
3/lxd/1   started  10.0.32.214  juju-21d1b4-3-lxd-1   focal   default  Container started
4         started  10.0.32.39   storage02             focal   default  Deployed
4/lxd/0   started  10.0.32.216  juju-21d1b4-4-lxd-0   focal   default  Container started
5         started  10.0.32.40   storage03             focal   default  Deployed
5/lxd/0   started  10.0.32.217  juju-21d1b4-5-lxd-0   focal   default  Container started
6         started  10.0.32.34   compute04             focal   default  Deployed
7         started  10.0.32.31   compute01             focal   default  Deployed
8         started  10.0.32.33   compute03             focal   default  Deployed
9         started  10.0.32.32   compute02             focal   default  Deployed
14        started  10.0.32.37   controller03          focal   default  Deployed

These are juju commands output:

Hi @hunglv8863

This looks like an issue best for the OpenStack team. So I have re-assigned the category to put in in their queue