LXD containers always stuck on 'pending' when deploying rabbitmq-server cluster

I am attempting to use juju to install a rabbitmq cluster on LXD.

$ juju --version
2.9.43-ubuntu-amd64
$ lxd --version
5.0.2

I am using revision 150 of the charm, that I have locally:-

$ git clone https://opendev.org/openstack/charm-rabbitmq-server

I attempt to deploy a rabbitmq cluster, following the instructions at: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ha.html#rabbitmq

$ juju deploy -n 3 --to lxd,lxd,lxd --config min-cluster-size=3 ./charm-rabbitmq-server

This invariably leads to a state where all the containers are listed as started but pending, and the deployment never finishes:-

$ juju status
Model    Controller  Cloud/Region         Version  SLA          Timestamp
default  overlord    localhost/localhost  2.9.43   unsupported  21:02:19+10:00

App              Version  Status   Scale  Charm            Channel  Rev  Exposed  Message
rabbitmq-server           waiting    0/3  rabbitmq-server           150  no       waiting for machine

Unit               Workload  Agent       Machine  Public address  Ports  Message
rabbitmq-server/0  waiting   allocating  0/lxd/0                         waiting for machine
rabbitmq-server/1  waiting   allocating  1/lxd/0                         waiting for machine
rabbitmq-server/2  waiting   allocating  2/lxd/0                         waiting for machine

Machine  State    Address        Inst id              Series  AZ  Message
0        started  10.140.147.93  juju-94265b-0        focal       Running
0/lxd/0  pending                 juju-94265b-0-lxd-0  focal       Container started
1        started  10.140.147.81  juju-94265b-1        focal       Running
1/lxd/0  pending                 juju-94265b-1-lxd-0  focal       Container started
2        started  10.140.147.71  juju-94265b-2        focal       Running
2/lxd/0  pending                 juju-94265b-2-lxd-0  focal       Container started

I have connected to the machines (eg. $ juju ssh 0) and don’t see any errors in the cloud-init logs.

What could be the problem here?

You can’t do nested lxd deployments in focal or above. This is a limitation of apparmor. If you tried to deploy without --to lxd then that should work.

Alternatively, if you moved to 3.1 your machines could run as vm virtypes and would allow you to use --to lxd for your deployments.

juju add-machine --constraints="virt-type=virtual-machine" -n 3

See the documentation and the following PR for more information

1 Like

Thanks - I can confirm that solved the problem.

$ juju deploy -n 3 --config min-cluster-size=3 ./charm-rabbitmq-server

Some time later:-

$ juju status
Model    Controller  Cloud/Region         Version  SLA          Timestamp
default  overlord    localhost/localhost  2.9.43   unsupported  18:34:31+10:00

App              Version  Status  Scale  Charm            Channel  Rev  Exposed  Message
rabbitmq-server  3.8.2    active      3  rabbitmq-server           150  no       Unit is ready and clustered

Unit                Workload  Agent  Machine  Public address  Ports               Message
rabbitmq-server/0*  active    idle   0        10.140.147.14   5672/tcp,15672/tcp  Unit is ready and clustered
rabbitmq-server/1   active    idle   1        10.140.147.21   5672/tcp,15672/tcp  Unit is ready and clustered
rabbitmq-server/2   active    idle   2        10.140.147.52   5672/tcp,15672/tcp  Unit is ready and clustered

Machine  State    Address        Inst id        Series  AZ  Message
0        started  10.140.147.14  juju-006572-0  focal       Running
1        started  10.140.147.21  juju-006572-1  focal       Running
2        started  10.140.147.52  juju-006572-2  focal       Running
1 Like