Deploying ceph-osd on base ubuntu@23.04 fails with an uselss error

rhxto · 23 January 2024 14:20

Hello I’m trying to deploy ceph-osd on ubuntu@23.04.

juju info ceph-osd clearly states that it’s possible (see attachment)

I am using MAAS and ubuntu@23.04 is downloaded and available.

When I try to deploy it, no matter which channel I choose, it fails with:

14:18:13 DEBUG juju.cmd.juju.application.store charmadapter.go:92 cannot interpret as charmstore bundle: lunar (series) != "bundle"
14:18:13 ERROR cmd charm.go:392 base: ubuntu@23.04/stable
14:18:13 DEBUG juju.api monitor.go:35 RPC connection died
14:18:13 DEBUG juju.api monitor.go:35 RPC connection died
ERROR failed to deploy charm "ceph-osd"
14:18:13 DEBUG cmd supercommand.go:549 error stack:
github.com/juju/juju/cmd/juju/application/deployer.(*repositoryCharm).PrepareAndDeploy:395: failed to deploy charm "ceph-osd"

How can I debug this?

chrome0 · 23 January 2024 15:34

Hi Federico could you please provide details on which steps you take to deploy (commandline plus bundle if there’s a bundle in use). TIA

Edit: also model defaults would be useful if you can share them

rhxto · 23 January 2024 21:35

Thanks for your reply.

The command I used is juju deploy --channel "quincy/stable" --base "ubuntu@23.04" ceph-osd

No matter which channel I choose among the ones that support ubuntu 23, the output is the same.

Juju is correctly configured with MAAS as not specifying a base deploys ceph correctly to the first available metal node with the default distro: ubuntu@22.04 LTS.

I noted that setting the MAAS options to deploy ubuntu 23 as default and not specifying the base in the deploy command gives a more useful error:

ERROR cannot deploy bundle: cannot add unit for application "ceph-osd": acquiring machine to host unit "ceph-osd/0": cannot assign unit "ceph-osd/0" to machine 0: base does not match: unit has "ubuntu@22.04", machine has "ubuntu@23.10"

I was using a bundle initially, but am now trying to deploy just ceph via command line. Using the bundle (configured the same way) gives the same output.

Juju model defaults are here: https://pastebin.com/qtMXbW14

chrome0 · 24 January 2024 08:47

So one issue you’re running into is that the default for the pkg source on quincy is focal/yoga and this conflicts with base 23.04. So you will need to set source explicitly for this combination. You can set it to distro to get the default 23.04 packages. This should result in something like the below:

$ juju deploy --channel "quincy/stable" --config source=distro --base "ubuntu@23.04" ceph-osd 
# wait...
$ juju status ceph-osd
Model  Controller           Cloud/Region             Version  SLA          Timestamp
tst    sabaini-serverstack  serverstack/serverstack  3.3.0    unsupported  08:43:59Z

App       Version  Status   Scale  Charm     Channel        Rev  Exposed  Message
ceph-osd  17.2.6   blocked      1  ceph-osd  quincy/stable  576  no       Missing relation: monitor

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-osd/1*  blocked   idle   1        172.20.0.16            Missing relation: monitor

Machine  State    Address      Inst id                               Base          AZ    Message
1        started  172.20.0.16  3389898b-b0cc-4336-ad3b-07c3a7c38e6d  ubuntu@23.04  nova  ACTIVE

However I do wonder if there’s another issue with MAAS at play here, because the message you posted (cannot assign unit "ceph-osd/0" to machine 0: base does not match: unit has "ubuntu@22.04", machine has "ubuntu@23.10) seems to hint at issues deploying ubuntu@23.04 in the first place, and this happens much sooner, before any source is configured.

Can you deploy other charms to ubuntu@23.04 with your MAAS?

rhxto · 24 January 2024 09:58

I tried setting the source config, but am getting the same result.

# juju deploy --channel "quincy/stable" --config source=distro --base "ubuntu@23.04" ceph-osd 
ERROR base: ubuntu@23.04/stable
ERROR failed to deploy charm "ceph-osd"

Setting the debug parameter doesn’t help

About the maas error: I was deploying latest/edge (which should support 23.04 and 23.10, but juju was complaining the charm had only 22.04)

Could this be an issue with my controller? Maybe some cached info about the charm?

EDIT: tried ceph-mon as well as rabbitmq-server with channels latest/edge. None of them works.

chrome0 · 24 January 2024 13:49

What does juju --debug add-machine --base "ubuntu@23.04" give you?

rhxto · 25 January 2024 07:40

Works correctly

07:39:11 INFO  juju.cmd.juju.machine add.go:343 model provisioning
07:39:11 INFO  cmd add.go:398 created machine 1
07:39:11 DEBUG juju.api monitor.go:35 RPC connection died
07:39:11 DEBUG juju.api monitor.go:35 RPC connection died
07:39:11 INFO  cmd supercommand.go:556 command finished
...
# juju status
Model      Controller      Cloud/Region         Version  SLA          Timestamp
openstack  captainamerica  maas-<mymaas>/gen10  3.3.0    unsupported  07:39:16Z

Machine  State    Address  Inst id  Base          AZ  Message
1        pending  <my-address>:142:0:1:0  giacomo  ubuntu@23.04  gen10  Deploying: Powering on

Deploying ceph-osd to the deployed and working machine gives the same error:

# juju deploy ceph-osd --to 1 --channel quincy/stable --config source=distro --debug
07:44:59 DEBUG juju.cmd.juju.application.store charmadapter.go:92 cannot interpret as charmstore bundle: jammy (series) != "bundle"
07:45:00 ERROR cmd charm.go:392 base: ubuntu@23.04/stable
07:45:00 DEBUG juju.api monitor.go:35 RPC connection died
07:45:00 DEBUG juju.api monitor.go:35 RPC connection died
ERROR failed to deploy charm "ceph-osd"
07:45:00 DEBUG cmd supercommand.go:549 error stack: 
github.com/juju/juju/cmd/juju/application/deployer.(*repositoryCharm).PrepareAndDeploy:395: failed to deploy charm "ceph-osd"

chrome0 · 25 January 2024 08:29

Unfortunately the log messages don’t provide too much info here… Could you check if the juju logs from the controller contain more information? Logs should be on the controller machine somewhere in /var/log/juju/machine-*.log

rhxto · 28 January 2024 08:38

Nothing really interesting on my 23.04 machine itself:

While the controller logs have this:

2024-01-25 12:10:18 ERROR juju.worker.dependency engine.go:695 "instance-poller" manifold worker returned unexpected error: unexpected: ServerError: 502 Bad Gateway (<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0 (Ubuntu)</center>
</body>
</html>
)
2024-01-25 12:10:30 ERROR juju.worker.dependency engine.go:695 "instance-poller" manifold worker returned unexpected error: unexpected: ServerError: 502 Bad Gateway (<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0 (Ubuntu)</center>
</body>
</html>

chrome0 · 29 January 2024 08:36

Not a lot of context but this looks like Juju might have issues contacting the MAAS API, is that possible? Could be networking, a proxy setting getting in the way or something like that?

rhxto · 31 January 2024 09:42

Hmm. The 502’s don’t seem to appear when I deploy ceph, it might be a temporary network error (we’ve had a couple power outages last week).

What I see in tail when deploying ceph-osd to a deployed ubuntu 23 machine (this one) the controller logs this in machine-0.log:

2024-01-31 09:36:23 INFO juju.apiserver.application.deployfromrepository baseselector.go:121 with the user specified base "ubuntu@20.04/stable"
2024-01-31 09:36:23 INFO juju.apiserver.application.deployfromrepository baseselector.go:121 with the user specified base "ubuntu@22.04/stable"

Hinting that it’s not considering ubuntu@23 as a suitable base

EDIT: the command was the same as before: juju deploy ceph-osd --to 1 --channel latest/edge --config source=distro --base "ubuntu@23.04" --debug

unit-controller-0.log looks ok:

2024-01-31 09:30:16 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2024-01-31 09:35:20 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)

I haven’t seen any more logs in the logsink, so I think these are all. Moreover I tested connectivity between the controller and MAAS, and everything looks good:

# juju switch controller
# juju ssh 0
$ curl http://maas6.mynetwork.tld:5248/MAAS/r/
<!doctype html>
...

rhxto · 26 February 2024 17:48

The source of this was a initrd issue, this is why I wanted to use 23.10. In the end, I just copied /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-23.10/mantic/stable/boot-initrd to 22.04/jammy/stable/boot-initrd and it worked wonderfully. Both juju and maas are happy because they think 22.04 is installed, and I get the working initrd.

I was evaluating building custom images as well, but that would require some heavy lifting I can’t do at the moment. Thanks Peter for your help