Community opinion on reliability and success rate of Ussuri deployments

Cheeky post, but I’m looking for opinions from the community regarding the current state of the Ussuri channel of Charmed Openstack.

For weeks now I’ve literally been fighting these deployments and there’s always something random which blows up. etcd getting stuck, neutron locking up everything, ovn bridging issues, certificate issues.

The whole reason behind DevOps tooling is reliability and predictability… this has been anything but that.

From a number of threads open on discourse I think it’s safe to say I’m not alone…?

1 Like

Hi @dvnt

I am a relatively newcomer to the world of Juju Charmed OpenStack. I picked up and started to teach myself Juju mid July, so the only deployments I’ve been working with are Focal/Ussuri based. I too have been fighting with these deployments. At first I took it as part of the learning curve to understanding how Juju operates, but now after spending some time on this discourse, I’m starting to think otherwise.

One big takeaway for me is that there are what seems to be race conditions with the OpenStack charms, in particular if you scale up your bundle with HACluster. In my experience, I have to deploy the OpenStack-Base bundle as is (app unit count wise) and then wait for those apps to become “active/ready” in order to scale up the # of units to complete the HACluster.

If I deploy a bundle that has all of the HA unit counts I want for the apps, I end up with a bunch of “ha-relations changed” errors because the app units are

  1. all waiting in queue for mysql-innodb-cluster to complete creating and starting its own cluster
  2. waiting for mysql to create all of the requested mysql-routers coming in all at once from all the other app-units.

This messes up Vault and Keystone in particular for me, which then starts to cascade down to other dependent apps .

Now my latest fight is post “successful” deployment of Charmed OpenStack. Every instance I create fails to spawn because of what seems to me to be Bug LP #1760047 and I have yet to find a workaround.

2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [req-94063796-8961-4d82-be1e-615f6b47716f c21c630a27f9485da9722509af8f916d c5aba0fb1d43450f810c9a30fae0f954 - a52f5e3d8f1d431dbd3e648fe7357684 a52f5e3d8f1d431dbd3e648fe7357684] [instance: 77e01168-a40b-4325-9884-dc4882e2a462] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462] Traceback (most recent call last):
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6408, in _create_domain_and_network
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     guest = self._create_domain(
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     next(self.gen)
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 513, in wait_for_instance_event
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     actual_event = event.wait()
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     result = hub.switch()
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 298, in switch
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     return self.greenlet.switch()
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462] eventlet.timeout.Timeout: 300 seconds
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462] During handling of the above exception, another exception occurred:
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462] Traceback (most recent call last):
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2614, in _build_resources
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     yield resources
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2374, in _build_and_run_instance
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     self.driver.spawn(context, instance, image_meta,
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3575, in spawn
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     self._create_domain_and_network(
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6431, in _create_domain_and_network
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]     raise exception.VirtualInterfaceCreateException()
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2020-09-10 17:01:43.853 150514 ERROR nova.compute.manager [instance: 77e01168-a40b-4325-9884-dc4882e2a462]
2020-09-10 17:01:43.868 150514 INFO nova.compute.manager [req-94063796-8961-4d82-be1e-615f6b47716f c21c630a27f9485da9722509af8f916d c5aba0fb1d43450f810c9a30fae0f954 - a52f5e3d8f1d431dbd3e648fe7357684 a52f5e3d8f1d431dbd3e648fe7357684] [instance: 77e01168-a40b-4325-9884-dc4882e2a462] Terminating instance
2 Likes

I experienced a similar issue before. In my case restarting neutron API services solved my problem.

As a newcomer you may run following commands:

juju run-action --wait neutron-api/0 pause
juju run-action neutron-api/0 resume