Charmed PostgreSQL How To - Troubleshooting

Troubleshooting

Warning: At the moment, there is no ability to pause an operator.

Make sure your activity will not interfere with the operator itself!

Note: All commands are written for juju >= v.3.0

If you are using an earlier version, be aware that:

  • juju run replaces juju run-action --wait in juju v.2.9
  • juju integrate replaces juju relate and juju add-relation in juju v.2.9

For more information, check the Juju 3.0 Release Notes.

Summary

This page goes over some recommended tools and approaches to troubleshooting the charm.

Before anything, always run juju status to check the list of charm statuses and the recommended fixes. This alone may already solve your issue.

Otherwise, this reference goes over how to troubleshoot this charm via:

juju logs

Please be familiar with Juju logs concepts and learn how to manage Juju logs.

Always check the Juju logs before troubleshooting further:

juju debug-log --replay --tail

Focus on ERRORS (normally there should be none):

juju debug-log --replay | grep -c ERROR

Consider enabling the DEBUG log level if you are troubleshooting unusual charm behaviour:

juju model-config 'logging-config=<root>=INFO;unit=DEBUG'

The Patroni/PostgreSQL logs are located inside SNAP:

> ls -la /var/snap/charmed-postgresql/common/var/log/*

/var/snap/charmed-postgresql/common/var/log/patroni:
-rw-r--r-- 1 snap_daemon snap_daemon 292519 Sep 15 21:47 patroni.log

/var/snap/charmed-postgresql/common/var/log/pgbackrest:
-rw-r----- 1 snap_daemon snap_daemon 7337 Sep 15 21:46 all-server.log
-rw-r----- 1 snap_daemon snap_daemon 5858 Sep 15 10:41 testbet.postgresql-stanza-create.log

/var/snap/charmed-postgresql/common/var/log/pgbouncer:
# The pgBouncer should be stopped on Charmed PostgreSQL deployments and produce no logs.

snap-based charm

First, check the operator architecture to become familiar with snap content, operator building blocks, and running Juju units.

To enter the unit, use:

juju ssh postgresql/0 bash

Make sure the charmed-postgresql snap is installed and functional:

ubuntu@juju-fd7874-0:~$ sudo snap list charmed-postgresql
Name                Version  Rev  Tracking       Publisher        Notes
charmed-postgresql  14.9     70   latest/stable  dataplatformbot  held

From here you can make sure all snap (systemd) services are running:

ubuntu@juju-fd7874-0# sudo snap services
Service                                          Startup   Current   Notes
charmed-postgresql.patroni                       enabled   active    -
charmed-postgresql.pgbackrest-service            enabled   active    -
charmed-postgresql.prometheus-postgres-exporter  enabled   active    -

ubuntu@juju-fd7874-0:~$ systemctl --failed
...
0 loaded units listed.

ubuntu@juju-fd7874-0:~$ ps auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.4  0.0 167364 12716 ?        Ss   21:40   0:02 /sbin/init
root          59  0.0  0.0  64596 20828 ?        Ss   21:40   0:00 /lib/systemd/systemd-journald
root         112  0.0  0.0  11088  5740 ?        Ss   21:40   0:00 /lib/systemd/systemd-udevd
root         115  0.3  0.0   4832  1816 ?        Ss   21:40   0:01 snapfuse /var/lib/snapd/snaps/core22_864.snap /snap/core22/864 -o ro,nodev,allow_other,suid
root         116  0.2  0.0   4896  1880 ?        Ss   21:40   0:01 snapfuse /var/lib/snapd/snaps/charmed-postgresql_70.snap /snap/charmed-postgresql/70 -o ro,nodev,allow_other,suid
root         117  0.0  0.0   4748  1644 ?        Ss   21:40   0:00 snapfuse /var/lib/snapd/snaps/core20_2015.snap /snap/core20/2015 -o ro,nodev,allow_other,suid
root         119  0.0  0.0   4692  1600 ?        Ss   21:40   0:00 snapfuse /var/lib/snapd/snaps/lxd_24322.snap /snap/lxd/24322 -o ro,nodev,allow_other,suid
root         120  0.6  0.0   4768  1840 ?        Ss   21:40   0:04 snapfuse /var/lib/snapd/snaps/snapd_19993.snap /snap/snapd/19993 -o ro,nodev,allow_other,suid
systemd+     225  0.0  0.0  16116  8100 ?        Ss   21:40   0:00 /lib/systemd/systemd-networkd
systemd+     227  0.0  0.0  25528 12664 ?        Ss   21:40   0:00 /lib/systemd/systemd-resolved
root         241  0.0  0.0   7284  2792 ?        Ss   21:40   0:00 /usr/sbin/cron -f -P
message+     243  0.0  0.0   8668  4916 ?        Ss   21:40   0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         247  0.0  0.0  33084 18792 ?        Ss   21:40   0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
syslog       248  0.0  0.0 152764  4748 ?        Ssl  21:40   0:00 /usr/sbin/rsyslogd -n -iNONE
snap_da+     250  0.0  0.0 1303900 10216 ?       Ssl  21:40   0:00 /snap/charmed-postgresql/70/usr/bin/prometheus-postgres-exporter
root         254  0.0  0.0  15312  7456 ?        Ss   21:40   0:00 /lib/systemd/systemd-logind
root         281  0.0  0.0   7760  3508 ?        Ss   21:40   0:00 bash /etc/systemd/system/jujud-machine-0-exec-start.sh
root         294  0.0  0.0   6216  1064 pts/0    Ss+  21:40   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
root         296  0.0  0.0  15420  9240 ?        Ss   21:40   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root         301  2.2  0.2 895540 97552 ?        Sl   21:40   0:13 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
root         335  0.0  0.0 110084 21336 ?        Ssl  21:40   0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root         418  0.0  0.0 235452  8128 ?        Ssl  21:40   0:00 /usr/libexec/polkitd --no-debug
root         772  0.4  0.0   4764  1780 ?        Ss   21:40   0:02 snapfuse /var/lib/snapd/snaps/snapd_20092.snap /snap/snapd/20092 -o ro,nodev,allow_other,suid
root         850  0.2  0.1 2058980 33536 ?       Ssl  21:40   0:01 /usr/lib/snapd/snapd
root        1587  0.0  0.0   4780  3264 ?        Ss   21:40   0:00 /bin/bash /snap/charmed-postgresql/70/start-patroni.sh
snap_da+    1615  1.1  0.1 490500 39308 ?        Sl   21:40   0:06 python3 /snap/charmed-postgresql/70/usr/bin/patroni /var/snap/charmed-postgresql/70/etc/patroni/patroni.yaml
snap_da+    2582  0.0  0.0 215816 30076 ?        S    21:41   0:00 /snap/charmed-postgresql/current/usr/lib/postgresql/14/bin/postgres -D /var/snap/charmed-postgresql/common/var/lib/postgresql --config-file=/var/snap/charmed-postgresql/common/var/lib/postgresql/postgresql.conf --listen_addresses=10.47.228.200 --port=5432 --cluster_name=postgresql --wal_level=logical --hot_standby=on --max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on
snap_da+    2808  0.0  0.0 215816 10704 ?        Ss   21:41   0:00 postgres: postgresql: checkpointer 
snap_da+    2810  0.0  0.0 215816 10496 ?        Ss   21:41   0:00 postgres: postgresql: background writer 
snap_da+    2811  0.0  0.0  70540  8804 ?        Ss   21:41   0:00 postgres: postgresql: stats collector 
snap_da+    2840  0.0  0.0 217980 21184 ?        Ss   21:41   0:00 postgres: postgresql: operator postgres 10.47.228.200(36138) idle
snap_da+    2947  0.0  0.0 216716 14736 ?        Ss   21:41   0:00 postgres: postgresql: walsender replication 10.47.228.241(45254) streaming 0/A002FA8
snap_da+    2952  0.0  0.0 215816 13140 ?        Ss   21:41   0:00 postgres: postgresql: walwriter 
snap_da+    2953  0.0  0.0 216424 10848 ?        Ss   21:41   0:00 postgres: postgresql: autovacuum launcher 
snap_da+    2954  0.0  0.0 215816  9132 ?        Ss   21:41   0:00 postgres: postgresql: archiver last was 00000001000000000000000A.partial
snap_da+    2955  0.0  0.0 216260  9516 ?        Ss   21:41   0:00 postgres: postgresql: logical replication launcher 
snap_da+    6556  0.0  0.0 216780 14780 ?        Ss   21:42   0:00 postgres: postgresql: walsender replication 10.47.228.164(48482) streaming 0/A002FA8
root        6799  0.0  0.0  39900 31164 ?        S    21:42   0:00 /usr/bin/python3 src/cluster_topology_observer.py https://10.47.228.200:8008 /var/snap/charmed-postgresql/current/etc/patroni/ca.pem /usr/bin/juju-run postgresql/0 /var/lib/juju/agents/unit-postgresql-0/charm
root        9831  0.0  0.0   4780  3204 ?        Ss   21:46   0:00 /bin/bash /snap/charmed-postgresql/70/start-pgbackrest.sh
snap_da+    9859  0.0  0.0  56152 13584 ?        S    21:46   0:00 /snap/charmed-postgresql/70/usr/bin/pgbackrest server --config=/var/snap/charmed-postgresql/70/etc/pgbackrest/pgbackrest.conf
root       10168  0.0  0.0  16908 10836 ?        Ss   21:47   0:00 sshd: ubuntu [priv]
ubuntu     10171  0.0  0.0  17056  9628 ?        Ss   21:47   0:00 /lib/systemd/systemd --user
ubuntu     10172  0.0  0.0 170148  4728 ?        S    21:47   0:00 (sd-pam)
ubuntu     10234  0.0  0.0  17208  7944 ?        R    21:47   0:00 sshd: ubuntu@pts/1
ubuntu@juju-fd7874-0:~$

The list of running snap/systemd services will depend on configured (enabled) COS integration and/or backup functionality. The snap service charmed-postgresql.patroni must always be active and currently running (the Linux processes snapd, patroni and postgres).

To access PostgreSQL, check the charm users concept and request operator credentials to use psql:

> juju show-unit postgresql/0 | awk '/private-address:/{print $2;exit}' 
10.47.228.200

> juju run postgresql/leader get-password username=operator
password: rV0Xn4l65KtQsHSq

> juju ssh postgresql/0 bash

> > psql -h 10.47.228.200 -U operator -d postgres -W
> > Password for user operator: rV0Xn4l65KtQsHSq
>
> > postgres=# \l
> > postgres | operator | UTF8 | C.UTF-8 | C.UTF-8 | operator=CTc/operator    +
> >          |          |      |         |         | backup=CTc/operator      +
> ...

Continue troubleshooting your database/SQL related issues from here.

Warning: Do NOT manage users, credentials, databases, schema directly. This avoids a split brain situation with the operator and integrated applications.

It is NOT recommended to restart services directly as it might create a split brain situation with operator internal state. If you see the problem with a unit, consider removing the failing unit and adding a new unit to recover the cluster state.

As a last resort, contact us if you cannot determine the source of your issue.

Also, feel free to improve this document!

Install extra software

We recommend you do not install any additional software. This may affect stability and produce anomalies that are hard to troubleshoot.

Sometimes, however, it is necessary to install some extra troubleshooting software.

Use the common approach:

ubuntu@juju-fd7874-0:~$ sudo apt update && sudo apt install gdb
...
Setting up gdb (12.1-0ubuntu1~22.04) ...
ubuntu@juju-fd7874-0:~$

Always remove manually installed components at the end of troubleshooting. Keep the house clean!