Zed Openstack new deployment with blocked cloud-controller and nova-compute

Hello all, I am using the base yaml deployment script with the addition of the neutron-gateway as the default option for cloud-controller’s network-manager option was Neutron. In Charmhub there is a note under that option that suggests the use of the neutron-gateway charm.

After the deployment I only initialize vault and cloud-controller with all the nova-computes are in blocked state.

cloud-controller log

2023-06-22 13:51:16 INFO unit.nova-cloud-controller/0.juju-log server.go:316 Updating status.
2023-06-22 13:51:16 INFO unit.nova-cloud-controller/0.juju-log server.go:316 Registered config file:
2023-06-22 13:51:17 WARNING unit.nova-cloud-controller/0.update-status logger.go:60 ERROR no relation id specified
2023-06-22 13:51:17 INFO unit.nova-cloud-controller/0.juju-log server.go:316 HAProxy context is incomplete, this unit has no peers.
2023-06-22 13:51:20 INFO unit.nova-cloud-controller/0.juju-log server.go:316 Generating template context from neutron api relation
2023-06-22 13:51:21 INFO unit.nova-cloud-controller/0.juju-log server.go:316 HAProxy context is incomplete, this unit has no peers.
2023-06-22 13:51:22 INFO unit.nova-cloud-controller/0.juju-log server.go:316 get_network_addresses: [('10.1.4.17', '10.1.4.17'), ('10.1.4.17', 'compute.p-net.cloud')]
2023-06-22 13:51:24 INFO unit.nova-cloud-controller/0.juju-log server.go:316 HAProxy context is incomplete, this unit has no peers.
2023-06-22 13:51:25 INFO unit.nova-cloud-controller/0.juju-log server.go:316 HAProxy context is incomplete, this unit has no peers.
2023-06-22 13:51:27 INFO unit.nova-cloud-controller/0.juju-log server.go:316 Generating template context from neutron api relation
2023-06-22 13:51:27 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via explicit, bespoke hook script)

nova-compute log

2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Making dir /var/lib/charm/nova-compute root:root 555
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Making dir /etc/ceph root:root 555
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/libvirt/qemu.conf
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/default/qemu-kvm
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/libvirt/libvirtd.conf
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/default/libvirt-bin
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/init/libvirt-bin.override
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/nova/nova.conf
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/nova/vendor_data.json
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/apparmor.d/usr.bin.nova-compute
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/nova/nova-compute.conf
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /etc/ceph/secret.xml
2023-06-22 14:00:19 INFO unit.nova-compute/0.juju-log server.go:316 Registered config file: /var/lib/charm/nova-compute/ceph.conf
2023-06-22 14:00:21 INFO unit.nova-compute/0.juju-log server.go:316 Updating status.
2023-06-22 14:00:21 WARNING unit.nova-compute/0.update-status logger.go:60 ERROR no relation id specified
2023-06-22 14:00:22 INFO unit.nova-compute/0.juju-log server.go:316 Generated config context for neutron network manager.
2023-06-22 14:00:23 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via explicit, bespoke hook script)

This is was gets repeated in the logs.

Could someone provide a clue ?

@billy-olsen is this also something your team could help with?

Thanks again for the tag @hpidcock!

@leijona hmm, its a bit tricky to figure out what’s going on with the bit of snippets that you’ve included. I realize that the logs are showing the WARNING messages regarding no relation id specified. It’s an unfortunate error message in that it happens more than it should but is generally innocuous in the cases it is called in. You can likely safely ignore these warnings for the time being. It very likely has nothing to do with the problem at hand.

When the charms go into blocked state, they typically indicate what is blocking them from progressing. This is put in a message in the status for the particular unit/application which you can see via the juju status command. If you can please provide a copy of the juju status output and a copy of the bundle you are using, we might be able to more easily help identify the challenge you are running into. You can use paste.ubuntu.com if you need a place to pastebin it. You may want to take a look and redact any parts you may consider sensitive for your deployment (e.g. IPs, etc).

Additionally, you mentioned the neutron-gateway charm but that’s generally only used when you elect the ml2/ovs driver for the neutron SDN. By default, the bundles we are using should provide an ml2/ovn driver which doesn’t require the neutron gateway. The pastebin of the bundle should help us sort some things out with you.

cc @james-page @ajkavanagh

Hello @billy-olsen and thank you @hpidcock for forwarding my issue.

Here is the deployment file and here is the juju status output. I wasn’t clear in my first post but the message I am getting is just

Services not running that should be: nova-scheduler

and the same for nova-compute.

Two reasons made me add the neutron-gateway. First of all, after creating the external network and a router I wasn’t able to ping its IP address in the VLAN set on the physnet. Second of all while looking at a solution as to why ml2 wasn’t working I saw the note in the nova-cloud-controller charm, in the network-manager config option that

When using the Neutron option you will most likely want to use the neutron-gateway charm to provide L3 routing and DHCP Services.

Hence I thought that was missing from the mix and I wasn’t able to make networking work. The original problem is totally offtopic I know, but how shall I proceed ? Since this is a new deployment, shall I remove the neutron-gateway to remove complexity from the picture ?

Thanks for your replies

@leijona

You definitely do not need the neutron-gateway as your bundle is using ovn as the SDN.

In the deployment scenario from your bundle, each of the compute nodes will act as a potential gateway in this environment, which means you’ll have some resiliency at the expense of each compute node (ovn-chassis node) will have the potential to serve ingress traffic. In other words, North/South traffic happens at each compute node rather than in a central set of nodes (which can be a bottleneck/choke point/single point of failure).

As far as the blocked status indicating "Services not running that should be: " - this indicates that the named service is not running on the specific node. Sometimes, this is a configuration option and other times it is due to bugs/race conditions identified in the code. The race conditions are often where services were started in incorrect order compared to events that were coming in from the Juju ecosystem.

We’ll need some more information to figure that part out, but I’d start with simply redeploying to remove the neutron-gateway. Afterwards, if you should encounter the race condition again - you can check to see if the service was running (log into the node and run a systemctl status <service-name>). If it isn’t, you’ll want to check the logs for the service (e.g. /var/log/nova/nova-scheduler.log) to see if there are unresolved errors. Sometimes this points to a configuration problem. The logs will help.

If the logs look like it didn’t start b/c it was lacking configuration, you can try to simply start the service manually (e.g. sudo systemctl start <service-name>). The charms should do this for you, and if the service starts just fine it will take the charm some time to reflect this in the status (it will catch it on an update-status hook execution - which is typically run once every 5 minutes).

If it starts up fine, can you collect the juju status at the time, and the logs from the keystone and nova-cloud-controller services and then raise a bug?

  • Billy

Hello @billy-olsen, unfortunately same thing happened after removing neutron-gateway so I opened bug report 2025048

Thanks