Hello,
I’m running a small 2-node LXD cluster for testing purposes. I’m trying to deploy two units of a charm that has two storage endpoints defined in metadata.yaml
and I’m expecting the units to be distributed evenly across the two cluster nodes. That is
machine1: one container + 2 storage volumes
machine2: one container + 2 storage volumes
The containers seems to be created correctly but I get this error from the filesystem volumes (both) that should be added to machine2
:
"4":
provider-id: docker:juju-4731c1-filesystem-4
storage: docker/4
attachments:
machines:
"2":
mount-point: ""
read-only: false
life: alive
units:
jenkins-agent/2:
machine: "2"
life: alive
pool: ssd-dir
size: 51200
life: alive
status:
current: attaching
message: 'attaching filesystem 4 to machine 2: Failed add validation for device
"filesystem-4": Failed loading custom volume: Storage volume not found'
since: 22 Dec 2022 02:19:35+01:00
And when I check lxc storage show docker
I find that the target for this volume is machine1
:
config: {}
description: ""
name: docker
driver: dir
used_by:
- /1.0/storage-pools/docker/volumes/custom/juju-4731c1-filesystem-2?target=machine1 # this on works fine since it's actually created on machine1
- /1.0/storage-pools/docker/volumes/custom/juju-4731c1-filesystem-4?target=machine1 # this should be machine2
status: Created
locations:
- machine1
- machine2
What is going on here? Is there something wrong with my setup that is causing this issue or am I missing something in juju?
Any input here is much appreciated!
2 Likes
What does your juju debug-log say?
If I wipe the application and redeploy with two units I’m getting the log below. In this particular case machine-2
gets deployed on the cluster node where the storage volumes are created and thus the deployment is successful. machine-3
is deployed on the other node and fails. I can’t see any obvious errors in the log it just stops with these lines, and then there is no more output from machine-3
:
no kvm containers possible
machine-3: 10:39:44 INFO juju.api connection established to "wss://10.20.20.228:17070/model/b20d17a5-57a6-402b-83a3-22e670580cf3/api"
machine-3: 10:39:44 INFO juju.worker.machiner "machine-3" started
unit-jenkins-agent-3: 10:39:44 INFO juju Starting unit workers for "jenkins-agent/3"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-3" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] password changed for "unit-jenkins-agent-3"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-3" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.migrationminion migration phase is now: NONE
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.logger logger worker started
unit-jenkins-agent-3: 10:39:44 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.upgrader no waiter, upgrader is done
machine-3: 10:39:44 INFO juju.worker.leadership jenkins-agent/3 promoted to leadership of jenkins-agent
machine-3: 10:39:44 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-jenkins-agent-3
machine-3: 10:39:44 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/2.9.37-ubuntu-amd64
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter unit "jenkins-agent/3" started
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter resuming charm install
unit-jenkins-agent-3: 10:39:44 INFO juju.worker.uniter.charm downloading local:focal/jenkins-agent-1 from API server
machine-3: 10:39:44 INFO juju.downloader downloading from local:focal/jenkins-agent-1
machine-3: 10:39:44 INFO juju.downloader download complete ("local:focal/jenkins-agent-1")
machine-3: 10:39:44 INFO juju.downloader download verified ("local:focal/jenkins-agent-1")
unit-jenkins-agent-3: 10:39:45 INFO juju.worker.uniter hooks are retried true
machine-3: 10:39:47 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-3: 10:39:47 INFO juju.worker.kvmprovisioner machine-3 does not support kvm container
Comparing this to machine-2
it looks like machine-3
just stops when it gets to the storage-attached hook:
no kvm containers possible
machine-2: 10:39:48 INFO juju.api connection established to "wss://10.20.20.228:17070/model/b20d17a5-57a6-402b-83a3-22e670580cf3/api"
machine-2: 10:39:48 INFO juju.worker.machiner "machine-2" started
unit-jenkins-agent-2: 10:39:48 INFO juju Starting unit workers for "jenkins-agent/2"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-2" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] password changed for "unit-jenkins-agent-2"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.apicaller [b20d17] "unit-jenkins-agent-2" successfully connected to "10.20.20.228:17070"
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.migrationminion migration phase is now: NONE
machine-2: 10:39:48 INFO juju.worker.leadership jenkins-agent leadership for jenkins-agent/2 denied
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.logger logger worker started
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.upgrader no waiter, upgrader is done
unit-jenkins-agent-2: 10:39:48 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
machine-2: 10:39:48 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-jenkins-agent-2
machine-2: 10:39:48 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/2.9.37-ubuntu-amd64
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter unit "jenkins-agent/2" started
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter resuming charm install
unit-jenkins-agent-2: 10:39:48 INFO juju.worker.uniter.charm downloading local:focal/jenkins-agent-1 from API server
machine-2: 10:39:48 INFO juju.downloader downloading from local:focal/jenkins-agent-1
machine-2: 10:39:48 INFO juju.downloader download complete ("local:focal/jenkins-agent-1")
machine-2: 10:39:48 INFO juju.downloader download verified ("local:focal/jenkins-agent-1")
machine-2: 10:39:51 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-2: 10:39:51 INFO juju.worker.kvmprovisioner machine-2 does not support kvm container
unit-jenkins-agent-2: 10:39:53 INFO juju.worker.uniter hooks are retried true
unit-jenkins-agent-2: 10:39:54 INFO juju.worker.uniter.operation ran "jenkins-storage-attached" hook (via hook dispatching script: dispatch)
unit-jenkins-agent-2: 10:39:54 INFO juju.worker.uniter.operation ran "docker-storage-attached" hook (via hook dispatching script: dispatch)
unit-jenkins-agent-2: 10:39:55 INFO juju.worker.uniter.storage initial storage attachments ready
unit-jenkins-agent-2: 10:39:55 INFO unit.jenkins-agent/2.juju-log Running legacy hooks/install.
unit-jenkins-agent-2: 10:39:55 INFO unit.jenkins-agent/2.juju-log
What does “juju storage” say in your model?
This:
Unit Storage ID Type Pool Size Status Message
jenkins-agent/26 docker/52 filesystem ssd-docker 5.0GiB attaching attaching filesystem 52 to machine 29: Failed add validation for device "filesystem-52": Failed loading custom volume: Storage volume not found
jenkins-agent/26 jenkins/53 filesystem ssd-zfs 5.0GiB attaching attaching filesystem 53 to machine 29: Failed add validation for device "filesystem-52": Failed loading custom volume: Storage volume not found
Seems to be in accordance with what lxc
reports, ie the storage is created on one machine and the container on another so juju
/lxc
can’t find it.
Is the storage defined in both lxd hosts? I’ve never encountered a 2 node LXD cluster so far, so I wouldn’t know how you synchronized the storage pool names and how you got them into juju awareness.
Perhaps the storage-pool isn’t available on all hosts in the lxd cluster?