Following the documentation I encounter an unexpected error during database migration;
$ juju run-action --wait neutron-api-plugin-ovn/0 migrate-ovn-db i-really-mean-it=true
...
Stdout: "migrate-ovn-db: OUTPUT FROM SYNC ON STDOUT:\n2020-11-25 11:06:17.760
413761 INFO neutron.cmd.ovn.neutron_ovn_db_sync_util [-] Started Neutron OVN
db sync\e[00m\n
...
11:06:57.026 413761 WARNING neutron.db.ovn_revision_numbers_db [req-91be95af-c8c7-41cc-a372-5cd7a0241373
- - - - -] No revision row found for e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575 (type:
networks) when bumping the revision number. Creating one.\e[00m\n2020-11-25
11:06:57.227 413761 INFO neutron.db.ovn_revision_numbers_db [req-91be95af-c8c7-41cc-a372-5cd7a0241373
- - - - -] Successfully bumped revision number for resource e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575
(type: networks) to 10\e[00m\n2020-11-25 11:06:57.565 413761 CRITICAL neutron_ovn_db_sync_util
[req-91be95af-c8c7-41cc-a372-5cd7a0241373 - - - - -] Unhandled error: neutron_lib.exceptions.IpAddressGenerationFailure:
No more IP addresses available on network e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575.\n2020-11-25
11:06:57.565 413761 ERROR neutron_ovn_db_sync_util Traceback (most recent call
last):\n
...
I figure something strange with the configuration of this network e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575, but it results in a partial migration so need to unwind something to fix it.
The neutron-api is paused at this point and in the config, manage-neutron-plugin-legacy-mode=false. Can I safely resume the neutron-api to diagnose the network?
Many thanks in advance for any clue!
Thanks. My report. Actually probably not a charm bug. The data is migrated by neutron_ovn_db_sync_util and it seems quite fragile. If there were instructions to set e.g. firewall-driver=openvswitch I missed them. A small amount of data might have been messed, but actually mostly the migration is working.
I do have a couple of hypervisors on which ovn-chassis is apparently missing neutron-ovn-metadata-agent. It is not installed. All others seem fine, e.g
nova-compute/14 active idle 48 192.168.200.220 Unit is ready
nova-compute-syslog/5 active idle 192.168.200.220 Unit is ready
ntp/190 active idle 192.168.200.220 123/udp chrony: Ready
ovn-chassis/0 blocked idle 192.168.200.220 Services not running that should be: neutron-ovn-metadata-agent
nova-compute/16 active idle 50 192.168.200.47 Unit is ready
nova-compute-syslog/6 active idle 192.168.200.47 Unit is ready
ntp/191* active idle 192.168.200.47 123/udp chrony: Ready
ovn-chassis/3 active idle 192.168.200.47 Unit is ready
On which…
(osc) $ juju ssh ovn-chassis/0 -- dpkg -l neutron-ovn-metadata-agent
dpkg-query: no packages found matching neutron-ovn-metadata-agent
Connection to 192.168.200.220 closed.
(osc) $ juju ssh ovn-chassis/3 -- dpkg -l neutron-ovn-metadata-agent
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==========================-=================-============-=======================================================================
ii neutron-ovn-metadata-agent 2:16.2.0-0ubuntu1 all Neutron is a virtual network service for Openstack - OVN metadata agent
Connection to 192.168.200.47 closed.
Actually I did get a bunch of metadata port errors during migration so I expect somehow this might be a hangover of a previously broken hypervisor?
Great to hear that you were able to move forward with the migration. The issue you mention here with some of the ovn-chassis units not having installed the neutron-ovn-metadata-agent package is interesting.
Installing it is decided at runtime depending on whether ovn-chassis has a relation to nova-compute or not to support using it with CMS’es other than OpenStack.
I assume you deployed the ovn-chassis charm with the new-units-paused configuration option set to support the migration, and there may be a race condition which results in the package not being installed when you resume the charm from its pause.
Would you be able to raise a bug targeting the charm-layer-ovn project on Launchpad and include the complete charm log from the affected unit(s).
No worries, thank you for reporting back. We will try to recreate the situation in our lab.
A possible way to salvage the situation could be to force a full hook execution. You can do that with this command: juju run --unit ovn-chassis/0 hooks/config-changed