Resource: res_masakari_f8b6bde_vip not running

yanxiaomu · 8 April 2021 03:37

Hi.

I remove-unit masakari from machine lxd:3 and add-machine masakari to machine lxd:0,

juju status masakari ,output is :

App                    Version  Status   Scale  Charm         Store       Rev  OS      Notes
hacluster                       blocked      3  hacluster     jujucharms   74  ubuntu
masakari               10.0.0   active       3  masakari      local         0  ubuntu
masakari-mysql-router  8.0.23   active       3  mysql-router  local         0  ubuntu

Unit                        Workload  Agent  Machine   Public address  Ports      Message
masakari/0*                 active    idle   1/lxd/2   10.0.2.123      15868/tcp  Unit is ready
  hacluster/0*              blocked   idle             10.0.2.123                 Resource: res_masakari_f8b6bde_vip not running
  masakari-mysql-router/0*  active    idle             10.0.2.123                 Unit is ready
masakari/1                  active    idle   2/lxd/2   10.0.2.134      15868/tcp  Unit is ready
  hacluster/2               blocked   idle             10.0.2.134                 Resource: res_masakari_f8b6bde_vip not running
  masakari-mysql-router/2   active    idle             10.0.2.134                 Unit is ready
masakari/3                  active    idle   0/lxd/17  10.0.2.225      15868/tcp  Unit is ready
  hacluster/3               blocked   idle             10.0.2.225                 Resource: res_masakari_f8b6bde_vip not running
  masakari-mysql-router/3   active    idle             10.0.2.225                 Unit is ready

Machine   State    DNS         Inst id               Series  AZ       Message
0         started  10.0.0.156  node2                 focal   default  Deployed
0/lxd/17  started  10.0.2.225  juju-78453b-0-lxd-17  focal   default  Container started
1         started  10.0.0.159  node4                 focal   default  Deployed
1/lxd/2   started  10.0.2.123  juju-78453b-1-lxd-2   focal   default  Container started
2         started  10.0.0.158  node3                 focal   default  Deployed
2/lxd/2   started  10.0.2.134  juju-78453b-2-lxd-2   focal   default  Container started

I has used commad :

juju run --unit masakari/0 sudo crm resource refresh
juju run --unit masakari/1 sudo crm resource refresh
juju run --unit masakari/3 sudo crm resource refresh

juju run --unit hacluster/0 sudo crm resource refresh
juju run --unit hacluster/2sudo crm resource refresh
juju run --unit hacluster/3 sudo crm resource refresh

juju run-action masakari/0 pause --wait
juju run-action masakari/1 pause --wait
juju run-action masakari/3 pause --wait

juju run-action masakari/0 resume --wait
juju run-action masakari/1 resume --wait
juju run-action masakari/3 resume --wait

juju run-action --wait vault/0 reissue-certificates

But they are not working.

Thank you!

afreiberger · 9 April 2021 14:50

There is a bug tracking an issue with scaleback which appears slated for release this month, however, for now you will need to manually remove the lxd:3 machine from crm with crm node delete <hostname of dead node in 'crm status'> and then run the config-changed hook across the cluster juju run --application hacluster 'hooks/update-status' to remove the corosync.conf entries for that removed unit.

While this explains a potential issue for you, I’d think if you have all 3 nodes online, then the quorum requirement of 3 should be satisfied and your resource should be running. You just won’t be able to tolerate a unit pause until the additional node is removed from corosync and pacemaker.

Using the following commands on the units may help in determining the issue of the not-running VIP.

crm status -r -f 1
corosync-quorumtool

You may also check the requirements for the VIP resource in pacemaker via
crm config show

yanxiaomu · 10 April 2021 11:13

Thank you @ afreiberger Drew Freiberger Canonical Staff.

I use crm status and crm configure show, output is paste here.

Then juju run --unit masakari/1 sudo crm node delete juju-db6013-1-lxd-2,`crm status , output is here，it still not work .

Then juju run --unit masakari/1 sudo crm node delete node1 ,juju run --application hacluster ‘hooks/update-status’ , crm status,output is paste here. It worked.

So awesome!

Thank you a lot again.

afreiberger · 12 April 2021 22:05

That’s very interesting. did you also remove a compute node? that “node1” you deleted looks like it was one of the masakari pacemaker-remotes.

I’d be interested to know why the Masakari API VIP failed to configure if the issue was a removed nova-compute node.

Did you expect that “node1” was no longer part of your cloud?

yanxiaomu · 13 April 2021 05:34

Oh.I am so sorry because I said is not clear.

First time I deploy a openstack with 4 compute node . I remove-unit masakari from lxd:3 and add-unit masakari --to lxd:0. Here are The openstak base bundle.yaml and masakari-overlay.yaml.

Second time I want to repeat the bug. So I deploy openstack again , but I modify the masakari-overlay.yaml to deploy masakari to lxd:0,lxd:1,lxd:2. Then remove-unit masakari/0 from lxd:1 and add-unit masakari to lxd:1.

Maybe it the reason that why remove compute node1.