LXD cluster with juju: Container created on one node and storage on another

Hello,

I’m running a small 2-node LXD cluster for testing purposes. I’m trying to deploy two units of a charm that has two storage endpoints defined in metadata.yaml and I’m expecting the units to be distributed evenly across the two cluster nodes. That is machine1: one container + 2 storage volumes

machine2: one container + 2 storage volumes

The containers seems to be created correctly but I get this error from the filesystem volumes (both) that should be added to machine2:

    "4":
      provider-id: docker:juju-4731c1-filesystem-4
      storage: docker/4
      attachments:
        machines:
          "2":
            mount-point: ""
            read-only: false
            life: alive
        units:
          jenkins-agent/2:
            machine: "2"
            life: alive
      pool: ssd-dir
      size: 51200
      life: alive
      status:
        current: attaching
        message: 'attaching filesystem 4 to machine 2: Failed add validation for device
          "filesystem-4": Failed loading custom volume: Storage volume not found'
        since: 22 Dec 2022 02:19:35+01:00

And when I check lxc storage show docker I find that the target for this volume is machine1:

config: {}
description: ""
name: docker
driver: dir
used_by:
- /1.0/storage-pools/docker/volumes/custom/juju-4731c1-filesystem-2?target=machine1 # this on works fine since it's actually created on machine1
- /1.0/storage-pools/docker/volumes/custom/juju-4731c1-filesystem-4?target=machine1 # this should be machine2
status: Created
locations:
- machine1
- machine2

What is going on here? Is there something wrong with my setup that is causing this issue or am I missing something in juju?

Any input here is much appreciated!

2 Likes

What does your juju debug-log say?

If I wipe the application and redeploy with two units I’m getting the log below. In this particular case machine-2 gets deployed on the cluster node where the storage volumes are created and thus the deployment is successful. machine-3 is deployed on the other node and fails. I can’t see any obvious errors in the log it just stops with these lines, and then there is no more output from machine-3:

no kvm containers possible
machine-3: 10:39:44 INFO juju.api connection established to "wss://10.20.20.228:17070/model/b20d17a5-57a6-402b-83a3-22e670580cf3/api"
machine-3: 10:39:44 INFO juju.worker.machiner "machine-3" started
unit-jenkins-agent-3: 10:39:44 INFO juju Starting unit workers for "jenkins-agent/3"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-3" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] password changed for "unit-jenkins-agent-3"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-3" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.migrationminion migration phase is now: NONE
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.logger logger worker started
unit-jenkins-agent-3: 10:39:44 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.upgrader no waiter, upgrader is done
machine-3: 10:39:44 INFO juju.worker.leadership jenkins-agent/3 promoted to leadership of jenkins-agent
machine-3: 10:39:44 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-jenkins-agent-3
machine-3: 10:39:44 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/2.9.37-ubuntu-amd64
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter unit "jenkins-agent/3" started
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter resuming charm install
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter.charm downloading local:focal/jenkins-agent-1 from API server
machine-3: 10:39:44 INFO juju.downloader downloading from local:focal/jenkins-agent-1
machine-3: 10:39:44 INFO juju.downloader download complete ("local:focal/jenkins-agent-1")
machine-3: 10:39:44 INFO juju.downloader download verified ("local:focal/jenkins-agent-1")
unit-jenkins-agent-3: 10:39:45 INFO juju.worker.uniter hooks are retried true
machine-3: 10:39:47 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-3: 10:39:47 INFO juju.worker.kvmprovisioner machine-3 does not support kvm container

Comparing this to machine-2 it looks like machine-3 just stops when it gets to the storage-attached hook:

no kvm containers possible
machine-2: 10:39:48 INFO juju.api connection established to "wss://10.20.20.228:17070/model/b20d17a5-57a6-402b-83a3-22e670580cf3/api"
machine-2: 10:39:48 INFO juju.worker.machiner "machine-2" started
unit-jenkins-agent-2: 10:39:48 INFO juju Starting unit workers for "jenkins-agent/2"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-2" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] password changed for "unit-jenkins-agent-2"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-2" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.migrationminion migration phase is now: NONE
machine-2: 10:39:48 INFO juju.worker.leadership jenkins-agent leadership for jenkins-agent/2 denied
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.logger logger worker started
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.upgrader no waiter, upgrader is done
unit-jenkins-agent-2: 10:39:48 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
machine-2: 10:39:48 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-jenkins-agent-2
machine-2: 10:39:48 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/2.9.37-ubuntu-amd64
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter unit "jenkins-agent/2" started
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter resuming charm install
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter.charm downloading local:focal/jenkins-agent-1 from API server
machine-2: 10:39:48 INFO juju.downloader downloading from local:focal/jenkins-agent-1
machine-2: 10:39:48 INFO juju.downloader download complete ("local:focal/jenkins-agent-1")
machine-2: 10:39:48 INFO juju.downloader download verified ("local:focal/jenkins-agent-1")
machine-2: 10:39:51 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-2: 10:39:51 INFO juju.worker.kvmprovisioner machine-2 does not support kvm container
unit-jenkins-agent-2: 10:39:53 INFO juju.worker.uniter hooks are retried true
unit-jenkins-agent-2: 10:39:54 INFO juju.worker.uniter.operation ran "jenkins-storage-attached" hook (via hook dispatching script: dispatch)
unit-jenkins-agent-2: 10:39:54 INFO juju.worker.uniter.operation ran "docker-storage-attached" hook (via hook dispatching script: dispatch)
unit-jenkins-agent-2: 10:39:55 INFO juju.worker.uniter.storage initial storage attachments ready
unit-jenkins-agent-2: 10:39:55 INFO unit.jenkins-agent/2.juju-log Running legacy hooks/install.
unit-jenkins-agent-2: 10:39:55 INFO unit.jenkins-agent/2.juju-log 

What does “juju storage” say in your model?

This:

Unit              Storage ID  Type        Pool        Size    Status     Message
jenkins-agent/26  docker/52   filesystem  ssd-docker  5.0GiB  attaching  attaching filesystem 52 to machine 29: Failed add validation for device "filesystem-52": Failed loading custom volume: Storage volume not found
jenkins-agent/26  jenkins/53  filesystem  ssd-zfs     5.0GiB  attaching  attaching filesystem 53 to machine 29: Failed add validation for device "filesystem-52": Failed loading custom volume: Storage volume not found

Seems to be in accordance with what lxc reports, ie the storage is created on one machine and the container on another so juju/lxc can’t find it.

Is the storage defined in both lxd hosts? I’ve never encountered a 2 node LXD cluster so far, so I wouldn’t know how you synchronized the storage pool names and how you got them into juju awareness.

Perhaps the storage-pool isn’t available on all hosts in the lxd cluster?