sunbeam bootstrap fails (nova error)

I tried to bootstrap:

2023.1/stable

2023.1/edge

2023.2/edge

in various configurations, all fail to bootstrap with:

nova/0* error idle 10.1.12.49 hook failed: “cell-database-relation-changed”

where the ‘hook failed’ might change.

Looks like I’m not the only one with that problem. A more detailed problem description can be found here:

Is there anyone who was able to do a sunbeam bootsrap during the last two weeks?

I’d be surprised if that’s totally broken. I tried to bootstrap on a fresh installed minimal ubuntu 22.04 server on a physical machine. Maybe I have just some unresolved dependencies.

Hi,

I have successfully installed 2023.1 and 2023.2/edge using multinode setup instructions on a physcial machine with 4xcore, 2x1G nic, 32GB RAM, 2x SSD( 250GB for system/snaps and 500GB for Microceph)

Openstack come up and are usable but are very unstable as mysql-0 and multiple mysql-routers restart every now and then. Seem to be caused py Pebble failing to reach mysql-0 primary address for some reason which trigger a restart (or if it actually redeploy the pod, not sure).

Hi Magnus,

thanks for your reply. How did you configure you network?

I’ve got it managed to bootstrap one time yesterday, with the same configuration that failed about 50 times before. Unfortunately this did not survive a reboot. It was unusable then. The next bootstrap failed the same way all the other ones failed. I have 20 CPU cores on my machine, so maybe there is a race in the bootstrap.

Hi,

I have two 1Gbps nics. First Nic uses dhcp with reserved ip. Second nic is left unconfigured as bootstrap process change its configuration during deployment. This nic need to be on same network as you use to hand out floating ip.

I was confused about what IP and subnet to give as response two the first two questions during bootstrap. Ended up using same /24 as I get from dhcp and for metallb I use a range from same Subnet that is excluded from dhcp server.

Am not in front of my computer atm but can send the complete config later for your reference.

Hi,

Would be nice if you could send your complete config. I also wonder why two physical NICs are needed. Wouldn’t two virtual NICs that are connected on L2 to one physical NIC do the same?

Another thing, does IPv6 work for external networking? I have several public IPv6 /64 subnets, but just a few public IPv4 addresses, so I’d like to use IPv6 whenever possible.

Btw. my initial problem seems to be a bug. See the link I’ve added above.

I’ve been consistently struggling with the timeouts with sunbeam bootstrap as well. There are a lot of findings that I’m uncovering, though:

  1. There are substantial differences with sunbeam cluster bootstrap between 2023.2/edge and 2023.1/edge

Example 2023.2/edge:

ubuntu@opnstk-server-vm:~$ sunbeam cluster bootstrap --help
Usage: sunbeam cluster bootstrap [OPTIONS]

  Bootstrap the local node.

  Initialize the sunbeam cluster.

Options:
  -a, --accept-defaults           Accept all defaults.
  -m, --manifest FILE             Manifest file.
  --role [control|compute|storage]
                                  Specify additional roles, compute or
                                  storage, for the bootstrap node. Defaults to
                                  the compute role.
  --topology [auto|single|multi|large]
                                  Allows definition of the intended cluster
                                  configuration: 'auto' for automatic
                                  determination, 'single' for a single-node
                                  cluster, 'multi' for a multi-node cluster,
                                  'large' for a large scale cluster
  --database [auto|single|multi]  Allows definition of the intended cluster
                                  configuration: 'auto' for automatic
                                  determination, 'single' for a single
                                  database, 'multi' for a database per
                                  service,
  -h, --help                      Show this message and exit.

Example 2023.1/edge:

ubuntu@opnstk-server-vm:~$ sunbeam cluster bootstrap --help
Usage: sunbeam cluster bootstrap [OPTIONS]

  Bootstrap the local node.

  Initialize the sunbeam cluster.

Options:
  -a, --accept-defaults           Accept all defaults.
  -p, --preseed FILE              Preseed file.
  --role [control|compute|storage]
                                  Specify additional roles, compute or
                                  storage, for the bootstrap node. Defaults to
                                  the compute role.
  --topology [auto|single|multi|large]
                                  Allows definition of the intended cluster
                                  configuration: 'auto' for automatic
                                  determination, 'single' for a single-node
                                  cluster, 'multi' for a multi-node cluster,
                                  'large' for a large scale cluster
  --database [auto|single|multi]  Allows definition of the intended cluster
                                  configuration: 'auto' for automatic
                                  determination, 'single' for a single
                                  database, 'multi' for a database per
                                  service,
  -h, --help                      Show this message and exit.

I’ve been trying to set up a preseed or manifest file, seemingly the “name” of what that flag is called has changed between the versions.

The file taking the shape of:

bootstrap:
  management_cidr: A.B.C.D/21
addons:
  metallb: 10.20.21.10-10.20.21.20
user:
  run_demo_setup: True
  username: demo
  password: testtesttest
  cidr: 192.168.122.0/24
  nameservers: 8.8.8.8
  security_group_rules: True
  remote_access_location: local
external_network:
  cidr: A.B.C.D/21
  gateway: A.B.C.D
  start: A.B.C.D
  end: A.B.C.D
  network_type: flat
  segmentation_id: 0
  nic: enp2s0
  physical_network: physnet1
microceph_config:
  opnstk-server-vm:
    osd_devices: /dev/vdc /dev/vdd /dev/vde
  1. With a preseed / manifest file it seems to take a different shape when bootstrapping. Espeically if these tasks to try to stand up MicroStack Sunbeam are to be automated I found out that it needs to look something like the following:

A. On 2023.2/edge:

- name: bootstrap openstack
  become_user: ubuntu
  ansible.builtin.shell: |
    echo "n" | /snap/bin/sunbeam cluster bootstrap --manifest /home/ubuntu/generated-preseed.yaml
  # ansible.builtin.shell: |
  #   /snap/bin/sunbeam cluster bootstrap --accept-defaults

- name: configure openstack
  become_user: ubuntu
  ansible.builtin.shell: |
    /snap/bin/sunbeam configure --manifest /home/ubuntu/generated-preseed.yaml --openrc demo-openrc

( note the echo "n" | - as otherwise, w/ 2023.2/edge we’re left in the dark with seemingly no way to configure it “not to prompt” us about the proxy setting ) As if without the echo "n" | - we’ll see:

ubuntu@opnstk-server-vm:~$ /snap/bin/sunbeam cluster bootstrap --manifest /home/ubuntu/generated-preseed.yaml
Configure proxy for access to external network resources? [y/n] (n): ^C
Aborted!

( hit control+c to bail out early on that ^ )

B. w/ 2023.1/edge it’s just:

- name: bootstrap openstack
  become_user: ubuntu
  ansible.builtin.shell: |
    /snap/bin/sunbeam cluster bootstrap --preseed /home/ubuntu/generated-preseed.yaml

- name: configure openstack
  become_user: ubuntu
  ansible.builtin.shell: |
    /snap/bin/sunbeam configure --preseed /home/ubuntu/generated-preseed.yaml --openrc demo-openrc

But I’ve never successfully been able to actually get the Deploy to finish successfully, I’m wondering if like there is anything else missing in the sunbeam prepare-node-script , like if either 2023.1/edge or 2023.2/edge are supposed to work with “current” Jammy builds or not…

It’s only ever historically hanging for me at:

Deploying OpenStack Control Plane to Kubernetes (this may take a while) ... waiting for services to come online (22/30)

And then after 1hr + some change, it’s just falling flat with the similar error messages you’re seeing. And I’ve not been able to get passed it no matter the variants of the networking I supply. Though it does seem to “provision more” like (24/30) on the 2023.1/edge vs the 2023.2/edge