repmgr quick start
Dear future self, this is a little copy-and-paste tutorial on how to get postgres replication running using repmgr.
First, install postgres and repmgr on all'a dem nodes:
echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list apt-get install wget ca-certificates wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - apt-get update apt-get install postgresql-9.5 postgresql-9.5-repmgr
Edit /etc/postgresql/9.5/main/pg_hba.conf
to include all your nodes:
host replication postgres 192.168.142.30/32 md5 host replication postgres 192.168.142.31/32 md5 host replication postgres 192.168.142.32/32 md5 host all postgres 192.168.142.30/32 md5 host all postgres 192.168.142.31/32 md5 host all postgres 192.168.142.32/32 md5 host all svedrin 192.168.142.35/32 md5 host all svedrin 192.168.142.36/32 md5
Don't forget to:
pg_ctlcluster 9.5 main reload
Set a password for the postgres user, and put it into /var/lib/postgresql/.pgpass
:
192.168.142.30:5432:*:postgres:Faeb8owi 192.168.142.31:5432:*:postgres:Faeb8owi 192.168.142.32:5432:*:postgres:Faeb8owi
Open a psql
shell and run:
ALTER SYSTEM SET wal_level = 'hot_standby'; ALTER SYSTEM SET listen_addresses = '*'; ALTER SYSTEM SET hot_standby = 'on'; ALTER SYSTEM SET max_wal_senders = '10'; ALTER SYSTEM SET max_replication_slots = '10';
Then:
pg_ctlcluster 9.5 main restart
Make sure that the postgres
users on each node can freely SSH to one another:
su - postgres ssh-keygen cp .ssh/id_rsa.pub .ssh/authorized_keys
Then distribute that .ssh
directory to /var/lib/postgresql
everywhere.
Now create /etc/postgresql/9.5/main/repmgr.conf
on each node:
cluster=pgtest # Be sure to adapt these three lines on every node node=1 # 2, 3 etc -- remove this comment node_name=pgtest-n0 conninfo=host=192.168.142.30 port=5432 use_replication_slots=1 event_notifications=master_register,standby_register,standby_unregister,standby_promote,standby_switchover #event_notification_command=/usr/local/bin/pgdeploy # Fix for Ubuntu/Debian pg_bindir=/usr/lib/postgresql/9.5/bin # Settings for automatic failover failover=automatic master_response_timeout=60 reconnect_attempts=5 reconnect_interval=10 promote_command=/usr/lib/postgresql/9.5/bin/repmgr -f /etc/postgresql/9.5/main/repmgr.conf standby promote follow_command=/usr/lib/postgresql/9.5/bin/repmgr -f /etc/postgresql/9.5/main/repmgr.conf standby follow loglevel=INFO
Now to get the cluster up and running:
postgres@pgtest-n0:~$ alias alias repmgr='repmgr -f /etc/postgresql/9.5/main/repmgr.conf' postgres@pgtest-n0:~$ # master postgres@pgtest-n0:~$ pg_ctlcluster 9.5 main start postgres@pgtest-n0:~$ repmgr master register postgres@pgtest-n1:~$ # first standby postgres@pgtest-n1:~$ repmgr standby clone -h 192.168.142.30 postgres@pgtest-n1:~$ pg_ctlcluster 9.5 main start postgres@pgtest-n1:~$ repmgr standby register postgres@pgtest-n2:~$ # second standby postgres@pgtest-n2:~$ repmgr standby clone -h 192.168.142.30 postgres@pgtest-n2:~$ pg_ctlcluster 9.5 main start postgres@pgtest-n2:~$ repmgr standby register postgres@pgtest-n0:~$ repmgr cluster show [2016-09-22 21:03:40] [INFO] connecting to database Role | Name | Upstream | Connection String ----------+-----------|-----------|------------------------------ * master | pgtest-n0 | | host=192.168.142.30 port=5432 standby | pgtest-n1 | pgtest-n1 | host=192.168.142.31 port=5432 standby | pgtest-n2 | pgtest-n1 | host=192.168.142.32 port=5432
To switch to a new master:
postgres@pgtest-n0:~$ pg_ctlcluster 9.5 main stop -m fast postgres@pgtest-n1:~$ repmgr standby promote postgres@pgtest-n2:~$ repmgr standby follow postgres@pgtest-n1:~$ repmgr cluster show [2016-09-22 21:03:40] [INFO] connecting to database Role | Name | Upstream | Connection String ----------+-----------|-----------|------------------------------ * master | pgtest-n1 | | host=192.168.142.31 port=5432 standby | pgtest-n2 | pgtest-n1 | host=192.168.142.32 port=5432 FAILED | pgtest-n0 | | host=192.168.142.30 port=5432
To revive the previous master (-n0
) as a new standby:
postgres@pgtest-n0:~$ repmgr standby clone -r -F -h 192.168.142.31 postgres@pgtest-n0:~$ pg_ctlcluster 9.5 main start postgres@pgtest-n0:~$ repmgr standby register -F postgres@pgtest-n0:~$ repmgr cluster show [2016-09-22 21:03:40] [INFO] connecting to database Role | Name | Upstream | Connection String ----------+-----------|-----------|------------------------------ * master | pgtest-n1 | | host=192.168.142.31 port=5432 standby | pgtest-n2 | pgtest-n1 | host=192.168.142.32 port=5432 standby | pgtest-n0 | pgtest-n1 | host=192.168.142.30 port=5432
Fencing and client provisioning
Of course, the clients will need a way to know which node to connect to. For this, we use a pgbouncer on the application servers, that we automatically push a new config to when repmgr changes the cluster state. Since I'm totally biased about this, I recommend pgdeploy for this purpose.
Then on the application server, install pgbouncer, import the ssh keys into the now-existing postgres
user, create /etc/pgbouncer/databases
, chown it to postgres
, and add the following in /etc/pgbouncer/pgbouncer.conf
:
[databases] %include /etc/pgbouncer/databases/pgtest.ini
Also, be sure to place a .pgpass
file into /var/lib/postgresql
so that pgbouncer can connect to the upstream database. If you just run pgdeploy, it should create a file in the databases directory with contents such as this:
svedrin = host=192.168.142.31 port=5432
For the clients to be able to connect, we also need to configure pgbouncer's user authentication. For this, on the master, run:
select usename, passwd from pg_shadow;
Copy the md5 hash for the user the application is supposed to use, and add an entry such as this into /etc/pgbouncer/userlist.txt
:
"svedrin" "md5d4b59c3b833a7b881cd6c5717205bc64"
Then you should be able to connect to localhost:6432
using that user and password.
So now, not only do the clients know where to connect to (because their pgbouncers know), but also dead nodes are fenced out reliably.
Automatic Failover
For automatic failover, also configure /etc/default/repmgrd
:
# default settings for repmgrd. This file is source by /bin/sh from # /etc/init.d/repmgrd # disable repmgrd by default so it won't get started upon installation # valid values: yes/no REPMGRD_ENABLED=yes # configuration file (required) REPMGRD_CONF="/etc/postgresql/9.5/main/repmgr.conf" # additional options REPMGRD_OPTS="--monitoring-history" # user to run repmgrd as #REPMGRD_USER=postgres # repmgrd binary #REPMGRD_BIN=/usr/bin/repmgrd # pid file #REPMGRD_PIDFILE=/var/run/repmgrd.pid REPMGRD_LOGFILE=/var/log/repmgrd.log
repmgrd
also requires a library be loaded and a set of functions be defined, so in a psql
shell, run:
ALTER SYSTEM SET shared_preload_libraries = 'repmgr_funcs'; \i /usr/share/postgresql/9.5/contrib/repmgr_funcs.sql
Note that the function definitions will be replicated to all the nodes by postgres automatically, but the ALTER SYSTEM will not. Restart postgres afterwards.
In order to get repmgrd
to produce a log file, you may want to tweak the init script by replacing the line that contains the startup command with this one:
start-stop-daemon --start --quiet --background --chuid "$REPMGRD_USER" --make-pidfile --pidfile "$REPMGRD_PIDFILE" --exec "$REPMGRD_BIN" --no-close -- --config-file "$REPMGRD_CONF" $REPMGRD_OPTS >"$REPMGRD_LOGFILE" 2>&1
This will tell start-stop-daemon
not to close open file descriptors, so that bash
can redirect the output into /var/log/repmgrd.log
.
Then start repmgrd
everywhere, and you're good to go. (Although the docs say repmgrd
shouldn't be run on the master, it also doesn't hurt to do so anyway. In case the master fails, repmgrd
will just not do anything.)
If you wish to enable synchronous replication, run:
ALTER SYSTEM SET synchronous_standby_names = 'pgtest-n0,pgtest-n1,pgtest-n2';
To disable it again, you can use:
ALTER SYSTEM RESET synchronous_standby_names;
Barman
In order to get barman to run, exchange SSH keys with the barman
user on your backup server, add it to pg_hba.conf
and run:
alter system set archive_mode = on; alter system set archive_command = 'rsync -a %p barman@192.168.142.36:/var/lib/barman/pgtest-n0/incoming';
In /etc/barman.conf
, add:
[pgtest-n0] ssh_command = ssh postgres@192.168.142.30 conninfo = host=192.168.142.30 user=svedrin password=hai9Eexe streaming_conninfo = host=192.168.142.30 user=svedrin password=hai9Eexe
Then run:
mkdir -p /var/lib/barman/pgtest-n0/incoming chown barman -R /var/lib/barman/ barman switch-xlog pgtest-n0 barman archive-wal pgtest-n0 barman backup pgtest-n0