Add a space to an existing Openstack deployment

ronyzd · 31 October 2021 15:07

hi, I have an existing Openstack deployment with 3 controllers and 6 compute/osd nodes. I am looking to add a new network space (vlan). I added the space in MAAS and “juju reload space”. However since all of the Openstack services are deployed to LXD’s on the controllers what is the right way to expose this new space to new lxd? should I manually change the NETPLAN YAML on each controller to add the VLAN (and bridge) to the physical nics ? If I do that i am afraid that my manual settigns will be disrigarded by juju. I see a lot of back yaml files on each contorller as if they are rewritten by juju. Any advise on this will be great ! RZ

afreiberger · 1 November 2021 15:59

In my experience, you should be able to add a new interface for that VLAN to your systems via netplan.yaml modification and apply the changes to bring up the new VLAN interfaces (including an IP on the network, as spaces are an L3 concept in juju). Then, restart the jujud-machine-X agents on the metals to have juju discover the new space available on the metal. Lastly, when you deploy a new LXD utilizing that space on that metal, juju should configure the necessary bridge(s) for you, you just need a vlan interface with an IP on the subnet matching the new space. It’s this bridge creation which creates the juju netplan.yaml backup files.

This suggestion is to work around the fact that modifying networking after MAAS deployment is not supported by the products. The expected method of cloud operations with Juju is to migrate services off of one node at a time, remove the machine from Juju, redefine the networking to add the new VLAN in MAAS, and then re-deploy the metal via Juju to configure netplan/spaces for you. This is not always practical at scale in production, hence the workaround I’ve suggested above.

ronyzd · 2 November 2021 14:29

Hi Drew, thank you for your suggestions. I did make the netplan modification on the first controller and restarted the jujud service. Now i tried deploying a test machine “ubuntu charm” and added a bind to my new space. I am getting the following error: "unable to setup network: multiple subnets matching “10.149.0.0/21”), retrying in10s). Any idea why i am getting this ? Thx

afreiberger · 2 November 2021 20:12

I think we’ll need further information to debug what may be happening.

What command are you using to target the controller metal to ensure your ubuntu charm unit is making it onto a server that has the available space? are you using --to lxd:<controller machine number>
Is 10.149.0.0/21 a super-set of other subnets in your environment?
What’s the output of juju spaces show?
What’s the output of ip -4 a on the controller to which you’re attempting to deploy the new container?

ronyzd · 3 November 2021 09:43

Hi Drew, 1- yes i am tageting the machine i made the netplan modification on with --to lxd:0 2- no 10.149.0.0/21 is the new space i want to use it does not overlap on other subnets 3- Juju spaces:

```
* 1. * alpha            0
```

* 2. * ceph-data-space  1         10.150.1.0/24

* 3. * ceph-repl-space  2         10.150.2.0/24

* 4. * data-space       3         10.150.3.0/24

* 5. * default          4         10.100.101.0/24

* 6. *                            172.16.0.0/24

* 7. * ext-net-space    5         192.168.0.0/24

* 8. * internal-space   6         10.150.4.0/24

* 9. * public-space     7         10.150.5.0/24

* 10. * storage-space    9         10.149.0.0/21

4- ip -4 a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
```
inet 127.0.0.1/8 scope host lo
```

   valid_lft forever preferred_lft forever

2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

inet 172.16.0.48/24 brd 172.16.0.255 scope global eno1

   valid_lft forever preferred_lft forever

6: br-bond0-405: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.150.5.201/24 brd 10.150.5.255 scope global br-bond0-405

   valid_lft forever preferred_lft forever

7: br-bond0-404: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.150.4.201/24 brd 10.150.4.255 scope global br-bond0-404

   valid_lft forever preferred_lft forever

8: br-bond0-401: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.150.1.201/24 brd 10.150.1.255 scope global br-bond0-401

   valid_lft forever preferred_lft forever

10: bond0.402@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.150.2.201/24 brd 10.150.2.255 scope global bond0.402

   valid_lft forever preferred_lft forever

33: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000

inet 10.147.89.1/24 scope global lxdbr0

   valid_lft forever preferred_lft forever

463: br-bond-406: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.149.0.10/21 brd 10.149.7.255 scope global br-bond-406

   valid_lft forever preferred_lft forever

234: br-bond0-403: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

inet 10.150.3.201/24 brd 10.150.3.255 scope global br-bond0-403

   valid_lft forever preferred_lft forever

Thank you for your help. RZ

afreiberger · 3 November 2021 12:26

Everything looks okay from the Juju and machine perspective. Perhaps the MAAS provider has the subnet listed multiple times in the subnet tab on different fabrics?

I might suggest checking the controller logs to see if that message is coming back from MAAS, and then checking the MAAS logs and subnets to see why MAAS may be failing.

ronyzd · 12 November 2021 16:47

Hello,

sorry i could not get back to you earlier i was away from office. The logs did not show anything significant same message “multiple subnets matching” .

I tried removing the hole space from maas and juju reload. Recreated the space in maas with a different CIDR, did a juju-reload but nothing changed still same error.

I finaly decided to check the juju db and to my surprise the same CIDR is listed in the “subnets” collection twice. So i guess when removing a space from maas and doing juju-reload it is not removing the subnet. Please see below the output from mongo.

juju:PRIMARY> db.spaces.find({ name: “storage-space” }); { “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:13”, “spaceid” : “13”, “life” : 0, “name” : “storage-space”, “is-public” : false, “providerid” : “14”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “txn-revno” : NumberLong(2), “txn-queue” : [ ] }

juju:PRIMARY> db.subnets.find({ cidr: “10.150.6.0/24”}) { “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:8”, “txn-revno” : NumberLong(2), “subnet-id” : “8”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “16”, “cidr” : “10.150.6.0/24”, “vlantag” : 406, “space-id” : “8”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:15”, “txn-revno” : NumberLong(2), “subnet-id” : “15”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “28”, “cidr” : “10.150.6.0/24”, “vlantag” : 406, “space-id” : “13”, “txn-queue” : [ ] }

juju:PRIMARY> db.subnets.find({ “space-id”: “8”}) { “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:8”, “txn-revno” : NumberLong(2), “subnet-id” : “8”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “16”, “cidr” : “10.150.6.0/24”, “vlantag” : 406, “space-id” : “8”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:9”, “txn-revno” : NumberLong(2), “subnet-id” : “9”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “17”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “8”, “txn-queue” : [ ] }

juju:PRIMARY> db.subnets.find({ cidr: “10.149.0.0/21”}) { “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:9”, “txn-revno” : NumberLong(2), “subnet-id” : “9”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “17”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “8”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:10”, “txn-revno” : NumberLong(2), “subnet-id” : “10”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “18”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “9”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:11”, “txn-revno” : NumberLong(2), “subnet-id” : “11”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “20”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “9”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:12”, “txn-revno” : NumberLong(2), “subnet-id” : “12”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “21”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “10”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:13”, “txn-revno” : NumberLong(2), “subnet-id” : “13”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “25”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “11”, “txn-queue” : [ ] }

{ “_id” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331:14”, “txn-revno” : NumberLong(2), “subnet-id” : “14”, “model-uuid” : “b36a43a3-b0f8-4b0e-816e-7a00039d7331”, “life” : 0, “providerid” : “26”, “cidr” : “10.149.0.0/21”, “vlantag” : 406, “space-id” : “12”, “txn-queue” : [ ] }

and this is the output of juju space: alpha 0 ceph-data-space 1 10.150.1.0/24 ceph-repl-space 2 10.150.2.0/24 data-space 3 10.150.3.0/24 default 4 10.100.101.0/24 172.16.0.0/24 ext-net-space 5 192.168.0.0/24 internal-space 6 10.150.4.0/24 public-space 7 10.150.5.0/24 storage-space 13 10.150.6.0/24

the only solution i see there is to manualy remove the subnets from the db. Any advise

Regards, Rony

ronyzd · 12 November 2021 17:03

FYI, juju client vesion 2.9.18 juju controller version 2.9.18 juju model version 2.9.18 MAAS version 3.0 thanks