Backup/restore to/from filesystem using postgresql charm

I’m trying to figure out how to do a backup/restore of a postgresql database, very similar to this excellent post by “stub”. PostgreSQL Point-in-Time Recovery with Juju – Stub

However, I don’t want to ship it to a public cloud provider or local swift/openstack object store (which the blog covers), but rather to a remote server, perhaps over ssh/rsync. I’m ok with saving the backup to local filesystem aswell as recovering from it.

Is there anyone who knows where “stub” is reached that might have insights?

I would also love to a see this kind of documentation in the charm itself since establish a backup/restore is such a key operational activity for a database.

@tmihoc

I don’t know myself but I’ll around. Will let you know if I find out anything!

1 Like

It seems from the config, this might be more or less “default” but I’m still not sure how a backup-restore process would look like…

I tried setting

juju config postgresql wal_e_storage_uri="file://localhost/backups/pg"

That triggered a snap install of wal-e, which was a step forward.

Then I tried to run a few actions.

image

The last command is just freezing…

1 Like

After some discussion in Mattermost, I think that we’ve fixed this. When specifying a backup on the local filesystem, the charm will not make a local dir or setup permissions. That’s okay, as this may be out of scope for the postgres charm (discussion welcome!) For now, the following will get things to work:

# Setup the backup dir
juju ssh postgresql/0
sudo mkdir /var/backups/pg
sudo chown postgres /var/backups/pg
exit

# Set the config
juju config postgresql wal_e_storage_uri="file://localhost/var/backups/pg"
juju run-action postgresql/1 wal-e-backup --wait
2 Likes

@pengale - this is amazing, I’m testing that right now.

Now, the next challenge is the recover part…

2 Likes

I’d written a charm which must relate to a postgresql DB… and does a bit of the things highlighted here like making directories or setting up some basic permissions within the DB… I am eager to see how I can automate the restoration of a backup too

1 Like

@emcp For sure. The reason I’m doing this now is that I’m in the process of writing backup procedures for the nextcloud charm that uses this postgres charm.

I’ve not yet released the charm, but having a proper backup/restore process in place feels like something important…

Any help greatly appreciated =) Drinks included.

1 Like

I got stuck a little bit before where you are now automating via WALE … mostly because I was not ready to start digging into Point In Time Restorations and needed to just take a simpler approach with pg_restore of a .sql file…

all the helper charm does right now is … acquire db-admin of the postgresql charm unit… and then from other charm pieces receive requests in the form of .sql snippets granting access downstream of itself

if it’s of interest can def. send it your way… but the db-admin makes me nervous, and I’ve only ever written these routines for LXD clouds

Yeah, the ol’way of just running

sudo -u postgres pg_dump foobar | gzip > foobar.sql.gz

Is legit anyways…

I’ve got 0 experience from wal-e, so I’m still exploring what this really leads me up to.

I’m rolling up a new db now to test a restore from this wal-e now to see where it gets me…

1 Like

So, I started a new fresh unit.

Copied back the complete /var/backup/pg

Created a new directory /var/backup/restore and fixed permissions…

Then I ran…

juju config postgresql wal_e_storage_uri="file://localhost/var/backups/restore"

… and then I tried the restore action…

erik@pixel:~ > juju run-action postgresql/4 wal-e-restore storage-uri="file://localhost/var/backups/pg" confirm=true --wait
unit-postgresql-4:
  UnitId: postgresql/4
  id: "27"
  results:
    Stderr: |
      wal_e.main   INFO     MSG: starting WAL-E
              DETAIL: The subcommand is "backup-list".
              STRUCTURED: time=2022-01-26T20:07:32.659825-00 pid=44274
      wal_e.main   INFO     MSG: starting WAL-E
              DETAIL: The subcommand is "backup-fetch".
              STRUCTURED: time=2022-01-26T20:07:36.695406-00 pid=44393
      wal_e.worker.file.file_worker INFO     MSG: beginning partition download
              DETAIL: The partition being downloaded is part_00000000.tar.lzo.
              HINT: The absolute file key is var/backups/pg/basebackups_005/base_000000010000000000000006_00000040/tar_partitions/part_00000000.tar.lzo.
              STRUCTURED: time=2022-01-26T20:07:36.936420-00 pid=44393
      wal_e.main   INFO     MSG: starting WAL-E
              DETAIL: The subcommand is "delete everything".
              STRUCTURED: time=2022-01-26T20:07:46.024054-00 pid=44430
      wal_e.main   INFO     MSG: deleting data in the store
              STRUCTURED: time=2022-01-26T20:07:46.156494-00 pid=44430
      wal_e.worker.base INFO     MSG: deleting part of wal logs
              DETAIL: The key being deleted is file://localhost/var/backups/restore/wal_005/000000010000000000000001.lzo.
              STRUCTURED: time=2022-01-26T20:07:46.164442-00 pid=44430
      Job for postgresql@12-main.service failed because the service did not take the steps required by its unit configuration.
      See "systemctl status postgresql@12-main.service" and "journalctl -xe" for details.
    Stdout: |
      lxc
      pg_control version number:            1201
      Catalog version number:               201909212
      Database system identifier:           7057599724181653945
      Database cluster state:               in production
      pg_control last modified:             Wed Jan 26 19:43:28 2022
      Latest checkpoint location:           0/6000060
      Latest checkpoint's REDO location:    0/6000028
      Latest checkpoint's REDO WAL file:    000000010000000000000006
      Latest checkpoint's TimeLineID:       1
      Latest checkpoint's PrevTimeLineID:   1
      Latest checkpoint's full_page_writes: on
      Latest checkpoint's NextXID:          0:488
      Latest checkpoint's NextOID:          24576
      Latest checkpoint's NextMultiXactId:  1
      Latest checkpoint's NextMultiOffset:  0
      Latest checkpoint's oldestXID:        479
      Latest checkpoint's oldestXID's DB:   1
      Latest checkpoint's oldestActiveXID:  488
      Latest checkpoint's oldestMultiXid:   1
      Latest checkpoint's oldestMulti's DB: 1
      Latest checkpoint's oldestCommitTsXid:0
      Latest checkpoint's newestCommitTsXid:0
      Time of latest checkpoint:            Wed Jan 26 19:43:28 2022
      Fake LSN counter for unlogged rels:   0/3E8
      Minimum recovery ending location:     0/0
      Min recovery ending loc's timeline:   0
      Backup start location:                0/0
      Backup end location:                  0/0
      End-of-backup record required:        no
      wal_level setting:                    logical
      wal_log_hints setting:                off
      max_connections setting:              100
      max_worker_processes setting:         8
      max_wal_senders setting:              10
      max_prepared_xacts setting:           0
      max_locks_per_xact setting:           64
      track_commit_timestamp setting:       off
      Maximum data alignment:               8
      Database block size:                  8192
      Blocks per segment of large relation: 131072
      WAL block size:                       8192
      Bytes per WAL segment:                16777216
      Maximum length of identifiers:        64
      Maximum columns in an index:          32
      Maximum size of a TOAST chunk:        1996
      Size of a large-object chunk:         2048
      Date/time type storage:               64-bit integers
      Float4 argument passing:              by value
      Float8 argument passing:              by value
      Data page checksum version:           1
      Mock authentication nonce:            540e679209ed9eca2d25eba22ac23d86a0485dff5a6b1bfa48c44d7985a86bd0
      2022-01-26 19:53:44.292 GMT [31861] LOG:  skipping missing configuration file "/etc/postgresql/12/main/juju_recovery.conf"
      2022-01-26 19:53:44.292 GMT [31861] LOG:  skipping missing configuration file "/etc/postgresql/12/main/juju_recovery.conf"
      2022-01-26 19:53:44 UTC [31861]: [3-1] db=,user= LOG:  starting PostgreSQL 12.9 (Ubuntu 12.9-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
      2022-01-26 19:53:44 UTC [31861]: [4-1] db=,user= LOG:  listening on IPv4 address "0.0.0.0", port 5432
      2022-01-26 19:53:44 UTC [31861]: [5-1] db=,user= LOG:  listening on IPv6 address "::", port 5432
      2022-01-26 19:53:44 UTC [31861]: [6-1] db=,user= LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
      2022-01-26 19:53:44 UTC [31862]: [1-1] db=,user= LOG:  database system was shut down at 2022-01-26 19:53:37 UTC
      2022-01-26 19:53:44 UTC [31861]: [7-1] db=,user= LOG:  database system is ready to accept connections
      2022-01-26 19:53:45 UTC [31870]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:45 UTC [31873]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:45 UTC [31873]: [2-1] db=template1,user=postgres LOG:  connection authorized: user=postgres database=template1 application_name=psql
      2022-01-26 19:53:45 UTC [31873]: [3-1] db=template1,user=postgres LOG:  disconnection: session time: 0:00:00.014 user=postgres database=template1 host=[local]
      2022-01-26 19:53:46 UTC [31876]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:46 UTC [31876]: [2-1] db=template1,user=postgres LOG:  connection authorized: user=postgres database=template1 application_name=psql
      2022-01-26 19:53:46 UTC [31876]: [3-1] db=template1,user=postgres LOG:  disconnection: session time: 0:00:00.005 user=postgres database=template1 host=[local]
      2022-01-26 19:53:46 UTC [31879]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:46 UTC [31879]: [2-1] db=template1,user=postgres LOG:  connection authorized: user=postgres database=template1 application_name=psql
      2022-01-26 19:53:46 UTC [31879]: [3-1] db=template1,user=postgres LOG:  disconnection: session time: 0:00:00.005 user=postgres database=template1 host=[local]
      2022-01-26 19:53:46 UTC [31880]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:46 UTC [31880]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:53:46 UTC [31880]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 19:53:46 UTC [31901]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:53:46 UTC [31901]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:53:46 UTC [31901]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.107 user=postgres database=postgres host=[local]
      2022-01-26 19:54:01 UTC [33301]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:54:01 UTC [33301]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:54:01 UTC [33301]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.007 user=postgres database=postgres host=[local]
      2022-01-26 19:55:01 UTC [35594]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:55:01 UTC [35594]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:55:01 UTC [35594]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.009 user=postgres database=postgres host=[local]
      2022-01-26 19:56:02 UTC [36998]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:56:02 UTC [36998]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:56:02 UTC [36998]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.010 user=postgres database=postgres host=[local]
      2022-01-26 19:57:01 UTC [37017]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:57:01 UTC [37017]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:57:01 UTC [37017]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 19:58:01 UTC [37136]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:58:01 UTC [37136]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:58:01 UTC [37136]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.007 user=postgres database=postgres host=[local]
      2022-01-26 19:58:44 UTC [31863]: [1-1] db=,user= LOG:  checkpoint starting: time
      2022-01-26 19:58:45 UTC [31863]: [2-1] db=,user= LOG:  checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.434 s, sync=0.026 s, total=0.495 s; sync files=4, longest=0.007 s, average=0.007 s; distance=2 kB, estimate=2 kB
      2022-01-26 19:59:01 UTC [37244]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 19:59:01 UTC [37244]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 19:59:01 UTC [37244]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 20:00:01 UTC [37292]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:00:01 UTC [37292]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:00:01 UTC [37292]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.007 user=postgres database=postgres host=[local]
      2022-01-26 20:01:01 UTC [38825]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:01:01 UTC [38825]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:01:01 UTC [38825]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.010 user=postgres database=postgres host=[local]
      2022-01-26 20:02:01 UTC [38844]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:02:01 UTC [38844]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:02:01 UTC [38844]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 20:03:01 UTC [38864]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:03:01 UTC [38864]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:03:01 UTC [38864]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 20:04:02 UTC [39562]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:04:02 UTC [39562]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:04:02 UTC [39562]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.009 user=postgres database=postgres host=[local]
      2022-01-26 20:05:01 UTC [41654]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:05:01 UTC [41654]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:05:01 UTC [41654]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
      2022-01-26 20:06:01 UTC [41680]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:06:01 UTC [41680]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:06:01 UTC [41680]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.011 user=postgres database=postgres host=[local]
      2022-01-26 20:07:01 UTC [43092]: [1-1] db=[unknown],user=[unknown] LOG:  connection received: host=[local]
      2022-01-26 20:07:01 UTC [43092]: [2-1] db=postgres,user=postgres LOG:  connection authorized: user=postgres database=postgres
      2022-01-26 20:07:01 UTC [43092]: [3-1] db=postgres,user=postgres LOG:  disconnection: session time: 0:00:00.009 user=postgres database=postgres host=[local]
      2022-01-26 20:07:33 UTC [31861]: [8-1] db=,user= LOG:  received fast shutdown request
      2022-01-26 20:07:33 UTC [31861]: [9-1] db=,user= LOG:  aborting any active transactions
      2022-01-26 20:07:33 UTC [31861]: [10-1] db=,user= LOG:  background worker "logical replication launcher" (PID 31869) exited with exit code 1
      2022-01-26 20:07:33 UTC [31863]: [3-1] db=,user= LOG:  shutting down
      2022-01-26 20:07:33 UTC [31863]: [4-1] db=,user= LOG:  checkpoint starting: shutdown immediate
      2022-01-26 20:07:33 UTC [31863]: [5-1] db=,user= LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.024 s, sync=0.001 s, total=0.260 s; sync files=0, longest=0.000 s, average=0.000 s; distance=7754 kB, estimate=7754 kB
      2022-01-26 20:07:34 UTC [31861]: [11-1] db=,user= LOG:  database system is shut down
      2022-01-26 20:07:46.422 GMT [44471] LOG:  skipping missing configuration file "/etc/postgresql/12/main/juju_recovery.conf"
      2022-01-26 20:07:46.422 GMT [44471] LOG:  skipping missing configuration file "/etc/postgresql/12/main/juju_recovery.conf"
      2022-01-26 20:07:46 UTC [44471]: [3-1] db=,user= LOG:  starting PostgreSQL 12.9 (Ubuntu 12.9-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
      2022-01-26 20:07:46 UTC [44471]: [4-1] db=,user= LOG:  listening on IPv4 address "0.0.0.0", port 5432
      2022-01-26 20:07:46 UTC [44471]: [5-1] db=,user= LOG:  listening on IPv6 address "::", port 5432
      2022-01-26 20:07:46 UTC [44471]: [6-1] db=,user= LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
      2022-01-26 20:07:46 UTC [44473]: [1-1] db=,user= LOG:  database system was interrupted; last known up at 2022-01-26 19:43:28 UTC
      2022-01-26 20:07:46 UTC [44473]: [2-1] db=,user= LOG:  creating missing WAL directory "pg_wal/archive_status"
      2022-01-26 20:07:46 UTC [44473]: [3-1] db=,user= FATAL:  using recovery command file "recovery.conf" is not supported
      2022-01-26 20:07:46 UTC [44471]: [7-1] db=,user= LOG:  startup process (PID 44473) exited with exit code 1
      2022-01-26 20:07:46 UTC [44471]: [8-1] db=,user= LOG:  aborting startup due to startup process failure
      2022-01-26 20:07:46 UTC [44471]: [9-1] db=,user= LOG:  database system is shut down
      pg_ctl: could not start server
      Examine the log output.
      recovery.conf exists at /var/lib/postgresql/12/main/recovery.conf
  status: completed
  timing:
    completed: 2022-01-26 20:07:47 +0000 UTC
    enqueued: 2022-01-26 20:07:29 +0000 UTC
    started: 2022-01-26 20:07:29 +0000 UTC

… and my unit isn’t so happy.

1 Like

I still haven’t been able to figure out how to restore my backup @pengale

Did you manage to do it?

@erik-lonroth apologies. I’ve been on vacation for the past few days. I did not attempt to do a restore. From those logs, it’s hard to tell whether the charm is doing something that is breaking the backup, or if the backup is failing due to some internal issues (e.g., more permissions stuff).

I’ll try to set aside some time to take a look at it tomorrow …

1 Like

I think that @mthaddon is also having a stab at this.

(mentioning here to include what I mentioned in reply to Erik on chat.charmhub.io)

Looking at https://www.postgresql.org/docs/13/recovery-config.html (and the fact that stub’s blog post was from 2017) I suspect this is a charm bug that manifests with PG12 or greater. I’ll try and find time to confirm that tomorrow (would start by deploying on an older series than focal) and then we can file a bug against the charm if we can confirm this is a PG12 or greater problem.

And in fact I see https://bugs.launchpad.net/postgresql-charm/+bug/1924756 already covers this. I’ll try to find time to confirm it does work on older versions, and then update that bug with details.

1 Like

Any news on this?

I would say this is a really great thing to get working for a database charm.

I can’t use it unless I can trust restoring works…

Can I help?

Hi Erik,

We’ve got some resources to work on this, so hopefully we’ll have a fix before too long. You could certainly help by verifying this works as expected on older series (bionic should be fine) and then testing again on focal once we’ve got a fix ready - we’ll update here once that’s happened.

We don’t have a specific timeline for that yet, but we did want to let you know someone will be working on it soon.

Thanks, Tom

2 Likes

I will for sure test it out!

I was looking for it and followed this solution. In my case what happened was the the backup went well, but it kept the archive_mode even after the backup was finished. So it continued to fill pg_wal directory even after wal_e_storage config was set to empty juju config postgresql wal_e_storage_uri=. It lead to a full disk and PG instance down.