Two NICs on MAAS Nodes LXD issues [SOLVED!]

dracozny · 1 January 2023 06:30

I started playing around with some of the Bundles in order to get my bearings and figure out how Juju works with each piece of my network. I have seen a number of posts regarding similar issues and I think i grasped the primary issues with them and multiple spaces hence eliminating the undetermined space error other people have gotten.

Background:
I have a 4 node server which contains 4 NICs for each node. I’m only using one 1GB NIC as the Public Space and one of the 10GB NICs for the Internal Network which is not exposed to the rest of the network and certainly not the internet.
This all seems to work ok at this level. the Juju controller is running on an LXD VM within MaaS.
Issues:
Using the Bundles such as the Kubernetes-Core I run into an issue where an LXD container (0/lxd/0) is created on what I believe is the first node (0). Part of the issue I found was that the Bundle asks for a Space named “Alpha” I tried changing it all to internal as recomended in Where are Bindings “alpha” coming from? and saw the issue with the LXD container persisted so I changed the space Name to “alpha” just to streamline testing. The 0/lxd/0 continued to be Pending.

When I looked through LXD I found that the br0 bridge (internal) had numerous containers created. however none of them are associated with br1 (public). This is somewhat interesting but I wasn’t sure if it was definitively an issue. On one hand an LXD container not being exposed is probably fine as the ultimate goal is I only need one container possibly the host node acting as the ingress into the network. But if I read correctly it seems like the container should have access to both. Documentation on this is a bit sketch on implementation of such a thing from what I can find.

I think my next step is to remote into the node and see why it’s not starting but if someone is able to shine a light in the right direction that would be great.

Update: so interestingly after leaving the machine alone instead of being my usual impatient self I notice the status page reports alpha-2 as a space.

no obvious space for container "0/lxd/0", host machine has spaces: "alpha-2", "public"

dracozny · 1 January 2023 17:59

FYI since someone will want to know versions:

sudo snap list
Name        Version                      Rev    Tracking       Publisher     Notes
conjure-up  2.6.14-20200716.2107         1062   latest/stable  canonical✓    classic
core        16-2.57.6                    14399  latest/stable  canonical✓    core
core18      20221212                     2667   latest/stable  canonical✓    base
core20      20221212                     1778   latest/stable  canonical✓    base
core22      20221212                     469    latest/stable  canonical✓    base
helm        3.7.0                        353    latest/stable  snapcrafters  classic
juju        3.0.2                        21474  3.0/stable     canonical✓    -
kubectl     1.26.0                       2787   latest/stable  canonical✓    classic
lxd         5.9-76c110d                  24164  latest/stable  canonical✓    -
maas        3.3.0~rc1-13127-g.e6737625f  25207  3.3/beta       canonical✓    -
snapd       2.57.6                       17883  latest/stable  canonical✓    snapd

dracozny · 1 January 2023 18:17

The LXD profile generated on the node:

    config: {}
    description: Default LXD profile
    devices:
      eth0:
        nictype: bridged
        parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by: []

50-cloud-init.yaml

$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        eno1:
            addresses:
            - 192.168.1.122/23
            gateway4: 192.168.0.1
            match:
                macaddress: ac:1f:6b:82:99:98
            mtu: 1500
            nameservers:
                addresses:
                - 192.168.0.242
                - 192.168.0.10
                search:
                - maas
                - Primary
            set-name: eno1
        eno2:
            match:
                macaddress: ac:1f:6b:82:99:99
            mtu: 1500
            optional: true
            set-name: eno2
        ens1f0:
            addresses:
            - 10.0.0.52/24
            match:
                macaddress: ac:1f:6b:8d:c5:44
            mtu: 1500
            nameservers:
                addresses:
                - 10.0.0.3
                search:
                - maas
                - Primary
            routes:
            -   table: 1
                to: 0.0.0.0/0
                via: 10.0.0.1
            routing-policy:
            -   from: 10.0.0.0/24
                priority: 100
                table: 1
            -   from: 10.0.0.0/24
                table: 254
                to: 10.0.0.0/24
            set-name: ens1f0
        ens1f1:
            match:
                macaddress: ac:1f:6b:8d:c5:45
            mtu: 1500
            optional: true
            set-name: ens1f1
    version: 2

lxdbr0

config:
  ipv4.address: 10.167.71.1/24
  ipv4.nat: "true"
  ipv6.address: none
  ipv6.nat: "false"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

dracozny · 2 January 2023 23:30

On a side question:
Do I actually need lxd involved in making this cluster? do I really need that one virtual node?
Long story short I want to create a kubernetes cluster that utilizes Ceph. In the example that is currently documented and it is quite old as it still calls out kubernetes-master among other things. The 3 nodes also create multiple LXD nodes for etcd and ceph-mon. Could I instead cut out the middleman as it were and just run everything within the hardware? Am I actually gaining anything by utilizing lxd?

dracozny · 5 January 2023 04:26

After stumbling around trying to push forward I came across this link Using Multiple Host Networks | Ubuntu The nugget here was getting me to look at what Juju sees for spaces.

~ juju spaces
Name       Space ID  Subnets
alpha      0
public     1         192.168.0.0/23
alpha-2    2         10.0.0.0/24
undefined  3         10.0.1.0/24
                 10.1.77.128/32
                 172.17.0.0/16

This makes it a bit strange because what it’s calling alpha-2 is what I had named alpha. It is actually referring to fabric-0 as alpha. alpha-2 as juju calls it is my internal network which started off being called fabric-3 in MaaS when I first started deploying machines.

So now I am trying to push everything over to fabric-0 for naming simplicity I can look at renaming them in the future. But hopefully this resolves the issue for anyone else. downside is I had to delete the juju controller to do this in MaaS but since there isn’t too much invested here it’s a minor inconvenience

dracozny · 5 January 2023 04:42

ok, so after redeploying the controller. i’m stumped.

~ juju spaces
Name       Space ID  Subnets
alpha      0
public     1         192.168.0.0/23
alpha-2    2         10.0.0.0/24
undefined  3         10.0.1.0/24
                     10.1.77.128/32
                     172.17.0.0/16

I even ran juju reload-spaces and it didn’t change unlike when I had done so during my attempts to manipulate it while the controller was operational. Here is what MaaS has set for reference:

~ maas $PROFILE fabrics read | jq '.[] | {id:.id, fabric_name:.name, vlans:.vlans[] | {id:.id, name:.name, vid:.vid}}' --compact-output
{"id":4,"fabric_name":"fabric-4","vlans":{"id":5005,"name":"untagged","vid":0}}
{"id":5,"fabric_name":"fabric-5","vlans":{"id":5006,"name":"untagged","vid":0}}
{"id":1,"fabric_name":"fabric-1","vlans":{"id":5002,"name":"untagged","vid":0}}
{"id":2,"fabric_name":"Public","vlans":{"id":5003,"name":"untagged","vid":0}}
{"id":0,"fabric_name":"alpha","vlans":{"id":5001,"name":"untagged","vid":0}}

dracozny · 5 January 2023 04:55

I tried Renaming it in MaaS and then using reload-spaces again and apparently juju doesn’t care what I do in MaaS. apparently I have to delete the controller again which I consider a bug. Also in case anyone is wondering, with the bundle charms like Kubernetes-core, --bind is flat out refused.

~ juju deploy kubernetes-core --bind alpha-2
ERROR options provided but not supported when deploying a bundle: --bind

dracozny · 5 January 2023 23:35

TL;DR: This is me ranting in frustration…

the Answer is forget the prebuilt bundles unless your following a user-guide and you fit the criteria exactly. likely as a result you are probably exclusively on LXD or you have Maas with several machines that only have a single network space defined.

If you have anything outside of that narrow definition, just build it out from scratch and define the default space in the model ahead of time. Unless someone provides you a bundle file that you can edit to get things going.

I’m running the openstack setup from scratch and the lxd machines are deploying to the hardware just fine. the documentation is woefully out of date so prepare to substitute commands and such to fit whatever version of JUJU you are currently using.

dracozny · 7 January 2023 02:26

There is still an underlying problem which I’m not sure if I caused or not. The resulting LXD containers cannot reach out to the internet. looking at the resulting /etc/netplan/50-… file I see it points to whatever gateway I set in Maas. The tricky part is there really isn’t one and I had to set that because MaaS would not let me leave it blank. I suppose I could make that address a NAT. But again based on the configuration of the node it seems redundant. Looking at Openstack’s Documentation this seems to be indicated in one of the infographics they provide.

or is it…

egress-subnets:
  type: string
  description: Source address(es) for traffic originating from this model

dracozny · 7 January 2023 03:49

Big fat nope. I tried this and it doesn’t give a crap about the setting. it aslo doesn’t even show up when you try to get the model info.

➜  ~ juju model-config egress-subnets=192.168.0.0/23
➜  ~ juju model-config default-space=internal

➜  ~ juju show-model openstack
openstack:
  name: admin/openstack
  short-name: openstack
  model-uuid: 878820a9-6889-4730-8eed-98f315bc25e1
  model-type: iaas
  controller-uuid: bb390f92-8725-41b6-82b5-583139e11fd9
  controller-name: mymaas-default
  is-controller: false
  owner: admin
  cloud: mymaas
  region: default
  type: maas
  life: alive
  status:
    current: available
    since: 1 hour ago
  users:
    admin:
      display-name: admin
      access: admin
      last-connection: just now
  machines:
    "0":
      cores: 32
    0/lxd/0:
      cores: 0
    "1":
      cores: 32
    1/lxd/0:
      cores: 0
    "2":
      cores: 32
    2/lxd/0:
      cores: 0
    "3":
      cores: 32
  sla: unsupported
  agent-version: 3.0.2
  credential:
    name: dracozny
    owner: admin
    cloud: mymaas
    validity-check: valid
  supported-features:
  - name: juju
    description: the version of Juju used by the model
    version: 3.0.2

dracozny · 7 January 2023 05:26

Just in case anyone in a similar situation thinks this would be clever…

➜  ~ juju bootstrap mymaas --config juju-mgmt-space=public
➜  ~ juju add-model openstack
➜  ~ juju model-config default-space=internal

Don’t bother. the containers completely loses touch with the controller after they boot.

dracozny · 7 January 2023 06:14

I think I have a working solution finally:

➜  ~ juju set-model-constraints spaces=internal,public
➜  ~ juju model-config default-space=internal
➜  ~ juju deploy -n 3  <whatever...app...etc.>
➜  ~ juju deploy -n 3 --to lxd:0,lxd:1,lxd:2 --series jammy --channel 8.0/stable ch:mysql-innodb-cluster --constraints spaces=internal,public

I added the constraint to the lxd deployment as well as a precaution. Interestingly enough if I output the bundle to yaml I can see:

  mysql-innodb-cluster:
    charm: mysql-innodb-cluster
    channel: 8.0/stable
    revision: 30
    resources:
      mysql-shell: 0
    num_units: 3
    to:
    - lxd:0
    - lxd:1
    - lxd:2
    constraints: arch=amd64 spaces=internal,public

the constraints are also shown where the machine declarations are made as well. And best of all I don’t see a complaint about not knowing which space to use, nor do I see any hook errors because it can’t download software. I ssh’d into 0/lxc/0 just to double check things and I could ping my router and googles DNS server.

Update: Every time you deploy an app to a new LXD container you do in fact have to add the constraint for spaces. I inadvertently missed a line and that machine could never complete it’s installation. if you miss it, just remove the app and the machine with --force and then redeploy the app with the constraint tags.

dracozny · 7 January 2023 19:47

Extra Bonus: Using the solution above if you want to deploy a bundle and not reinvent the wheel:

~ Juju download <Charm-Bundle>
~ unzip -q <Charm-Bundle>.bundle -d <charm-Bundle>/
~ cd <Charm-Bundle>/

at this point use your favorite editor and edit the bundle.yaml file and ensure you add the constraint line from the solution

- lxd:x
constraints: spaces=space1,space2,etc

once done you can then deploy with juju deploy ./bundle.yaml and all should go to plan.

manadart · 13 January 2023 16:07

Great post. Thanks for walking through and posting the correct solution - using multiple space constraints.

bmk-juju2 · 25 October 2023 07:00

Thanks for the help