Networking Issues Openstack-base bundle (focal/wallaby -- ovn)

Hej,

so I’ve been stuck with some strange networking issues for a while now. It appears in a couple of different forms.

I’m deploying a vanilla Openstack-base bundle, modifying the data-port to match the interface names on my hosts (MAAS provided) i.e. “data-port: &data-port br-ex:enp2s0f0”

VM to VM communication across a (cf 10.0.1.0/24) internal network works, but the VM does not reach the router interface(RI) (10.0.1.1) on that network. From OpenStacks perspective, the RI is up and active. The distributed interface (10.0.1.2) is down from OpenStacks perspective, but VMs can reach it.

To make this even more confusing, occasionally after some random amount of waiting (hour or more), sometimes the VMs can reach the RI interface. Even the ‘public’ facing IP on the router, but never beyond.

From what I can see, the VM has its tap interface, this is connected to the br-int (ovs-bridge). Interestingly this bridge is down (ip link show br-int), the same applies to the br-ex. From what I’ve understood from the exchanges on IRC, the secondary NIC on a host should be used for transportation of both the internal VM-to-VM traffic, and also the external traffic (provider network). When doing some tcpdump on that interface (enp2s0f0 in my case), I cant see any traffic originating from the host, which I’d expect to see if the VM-to-VM traffic was tunneled here.

Now, looking at the ovn-central (leader), it seems that the VM-to-VM traffic is handled within the deployment network interface, not the secondary NIC.

root@juju-93e117-1-lxd-5:~# ovn-nbctl show
switch 5f491f60-3696-41ab-b45d-7431091f3ed6 (neutron-c9974a0b-f15f-4410-b7b0-2b86f83ee286) (aka internal_net)
port 01145503-2cd9-46b2-a0aa-7d0dfbb37b3b
addresses: [“fa:16:3e:33:1a:1d 10.0.1.27”, “unknown”]
port f08eadee-6474-4718-add7-a60b401d22ff
addresses: [“fa:16:3e:f8:6d:a0 10.0.1.5”, “unknown”]
port b991a596-5feb-4976-b583-1e55cc2e5816
type: localport
addresses: [“fa:16:3e:56:98:2e 10.0.1.2”]
port 8ec79a8f-02a3-453e-ab76-56c372d19e35
type: router
router-port: lrp-8ec79a8f-02a3-453e-ab76-56c372d19e35
switch 49f3e7ab-2a79-4228-a2ee-74ffc491c89e (neutron-faaf8551-8298-4968-87bf-72d7de497d10) (aka ext_net)
port 48f87c68-cdee-4f40-a96c-603dcd6a9bd1
type: router
router-port: lrp-48f87c68-cdee-4f40-a96c-603dcd6a9bd1
port provnet-4aa6f3d1-e7c7-45a6-82e4-5ef23503d16b
type: localnet
tag: 132
addresses: [“unknown”]
port b9e52074-d2ca-476b-8887-b08c41fdeb5a
type: localport
addresses: [“fa:16:3e:31:ac:c4”]
router a3fb692d-7d10-47a0-a0f9-e385e9bc3703 (neutron-13618f0a-d0d7-4f45-8149-65c222426f5f) (aka provider-router)
port lrp-8ec79a8f-02a3-453e-ab76-56c372d19e35
mac: “fa:16:3e:fc:af:a0”
networks: [“10.0.1.1/24”]
port lrp-48f87c68-cdee-4f40-a96c-603dcd6a9bd1
mac: “fa:16:3e:ce:30:fa”
networks: [“a.b.c.38/27”]
gateway chassis: [idrac-BZMM613.dmaas idrac-4NLFJ13.dmaas idrac-5NLFJ13.dmaas]
nat 9c23c756-74ec-42ec-831a-6749fa5d8ad6
external ip: “a.b.c.40”
logical ip: “10.0.1.27”
type: “dnat_and_snat”
nat bac2e831-c36c-4647-acb1-781f995cc264
external ip: “a.b.c.38”
logical ip: “10.0.1.0/24”
type: “snat”

Now; I’m new to ovn syntax yet, but I think the internal_net switch is missing one port, the router interface 10.0.1.1? On the other hand, that interface is clearly visible on the router (provider-router). Or, I’m just reading it wrong, and the port 8ec79a8f-02a3-453e-ab76-56c372d19e35 actually points to the corresponding router-port (lrp-8ec79a8f-02a3-453e-ab76-56c372d19e35). But it still does not explain why the VM cannot reach the 10.0.1.1 address.

In hope that somebody can explain/point out whats going wrong; I’m also including the sbctl output from the ovn-central.

root@juju-93e117-1-lxd-5:~# ovn-sbctl show
Chassis idrac-4NLFJ13.dmaas
    hostname: idrac-4NLFJ13.dmaas
    Encap geneve
        ip: "10.35.0.165"
        options: {csum="true"}
    Port_Binding "01145503-2cd9-46b2-a0aa-7d0dfbb37b3b"
Chassis idrac-5NLFJ13.dmaas
    hostname: idrac-5NLFJ13.dmaas
    Encap geneve
        ip: "10.35.0.106"
        options: {csum="true"}
Chassis idrac-BZMM613.dmaas
    hostname: idrac-BZMM613.dmaas
    Encap geneve
        ip: "10.35.0.138"
        options: {csum="true"}
    Port_Binding cr-lrp-48f87c68-cdee-4f40-a96c-603dcd6a9bd1
    Port_Binding "b991a596-5feb-4976-b583-1e55cc2e5816"
    Port_Binding "f08eadee-6474-4718-add7-a60b401d22ff"

BR/Patrik

I’m no expert on the subject, but it seems to me that it’s because of creates an isolated network, so we must use managed ports because of the update exchanges we set PortDeletionPolicy to retain

Hej,

I’ve successfully deployed the Wallaby bundle, after testing 4 scenarios.

  • The first NIC, only has ONE IP the ‘deployment’ subnet needed by MAAS, configured with ‘Auto Assign’.
  • The second NIC could/should be associated with a subnet, but left as ‘Unconfigured’ wrt. IP address.
  • The secondary NIC has one VLAN associated with it, IP is ‘auto assign’.

This deployment fails, when using ONE or TWO network spaces. The two spaces have IP connectivity between each other. The problem seems to be that there is -something- associated with the second NIC.

This is fine, as this ‘breaks’ the requirement “clean second NIC”. This should perhaps be emphasized in the charm requirements “two cabled NICs”. The ‘clean’ part I discovered after asking questions on IRC.

  • The first NIC, only has ONE IP the ‘deployment’ subnet needed by MAAS, configured with ‘Auto Assign’.
  • The second NIC could/should be associated with a subnet, but left as ‘Unconfigured’ wrt. IP address.
  • The primary NIC has one VLAN associated with it, IP is ‘auto assign’.

This deployment fails, when using TWO network spaces. The two spaces have IP connectivity between each other.

This is a bit more unclear; the second NIC is clean. Both spaces are on the first NIC.
The hypothesis is that the ‘clean’ requirement also applies to the first NIC.

  • The first NIC, only has ONE IP the ‘deployment’ subnet needed by MAAS, configured with ‘Auto Assign’.
  • The second NIC could/should be associated with a subnet, but left as ‘Unconfigured’ wrt. IP address.
  • The primary NIC has one VLAN associated with it, IP is ‘auto assign’.

This deployment fails, despite only ONE network space is used.

This is scenario 2 with only ONE space used, but we’ve kept a VLAN on the first NIC, thus making it ‘unclean’.

  • The first NIC, only has ONE IP the ‘deployment’ subnet needed by MAAS, configured with ‘Auto Assign’.
  • The second NIC could/should be associated with a subnet, but left as ‘Unconfigured’ wrt. IP address.
    -There should not be any VLAN associated with ANY of the NICS.

This deployment works fine if only ONE network space is used.

BR/Patrik