Missing files, postgresql charm - database issues

erik-lonroth · 16 March 2023 09:44

I’ve ran into a problem with the postgresql charm. After a reboot, postgresql wont start and seem to be missing critical files.

This is what I see:

systemctl status postgresql
...
Error: /var/lib/postgresql/12/main is not accessible or does not exist

In the directory:

ls -l /var/lib/postgresql/12/main
lrwxrwxrwx 1 root root 19 Sep 23 13:20 /var/lib/postgresql/12/main -> /srv/pgdata/12/main

Looking at the juju storage for postgresql charm:

juju status --storage
...
Storage Unit  Storage ID  Type        Pool    Mountpoint           Size     Status     Message
nextcloud/0   datadir/0   filesystem  data    /var/nextcloud/data  1000GiB  attaching  attaching filesystem 0 to machine 0: container "juju-b8aeb2-0" already has a device "filesystem-0"
postgresql/0  pgdata/1    filesystem  rootfs  /srv/pgdata          2.9TiB   attached

Somehow, the database files has gone away.

There seem to be some other directory names “main” for postgresql, is this some kind of backup ?

ls -la /var/lib/postgresql/12/
total 4
drwxr-xr-x  3 postgres postgres  4 Sep 23 13:20 .
drwxr-xr-x  7 postgres postgres  8 Mar 16 09:29 ..
lrwxrwxrwx  1 root     root     19 Sep 23 13:20 main -> /srv/pgdata/12/main
drwx------ 19 postgres postgres 21 Sep 23 13:20 main-1663939245

Any help here appreciated to help me recover the database.

jnsgruk · 16 March 2023 09:49

@erik-lonroth which revision / channel are you using for the PostgreSQL charm?

erik-lonroth · 16 March 2023 09:50

juju status

postgresql 12.14 blocked 1 postgresql latest/stable 270 no PostgreSQL failed to start

0x12b · 16 March 2023 09:57

@neppel, do you have any suggestions for Erik?

erik-lonroth · 16 March 2023 09:58

To be even more clear, the whole directory /srv/pgdata is empty.

erik-lonroth · 16 March 2023 10:08

[UPDATE]

We use lxd as the underlying cloud and have zfs as the underlying storage pools.

After restarting the container multiple times, we suddenly get the files back!

mount 
...
default/containers/juju-b8aeb2-2 on /srv/pgdata type zfs (rw,relatime,xattr,posixacl)

This is something we encounter ALOT and we dont know exactly why.

This is scary. Mostly we can resolve these situations by restarting the container multiple times, but sometimes not… @joakimnyman would also be able to chip in to this discussion.

Q: Is this a juju issue or a lxd issue? @stgraber

We have managed to recover the database (rebooting until the mount worked), but yes, this situation is scary. This happens to us alot for other systems using juju storage.

neppel · 16 March 2023 11:52

Hi @erik-lonroth! I think the issue may be related to Bug #1999758 “Juju doesn't mount storage after lxd container res...” : Bugs : Canonical Juju. We found a similar situation with the new PostgreSQL charm (the one from the edge channel) in the past and created a workaround for it. I haven’t tested if the issue is fixed on Juju 2.9.42 yet. Is that the version you’re using?

jnsgruk · 16 March 2023 15:26

Yeh certainly feels like that should now be fixed - if not we probably need to figure out if it’s the same issue and let the Juju team know.

erik-lonroth · 16 March 2023 23:08

Thanx for the updates and we are on juju-controller 2.9.38 at the moment, but we will upgrade to 2.9.42 for sure since its mentioned in Bug #1999758 “Juju doesn't mount storage after lxd container res...” : Bugs : Canonical Juju

We can try to reproduce, but we’ll upgrade nevertheless before we start the next maintenance. @joakimnyman

Much appreciated for the attention as this was getting hairy.