Synchronous Replication in PostgreSQL

David Youatt
PostgreSQL

PostgreSQL has supported streaming replication and hot standbys since version 9.0 (2010), and synchronous replication since version 9.1 (2011). Streaming replication (and in this case we're referring to "binary" streaming replication, not "logical")  sends the PostgreSQL WAL stream over a network connection from primary to a replica. By default, streaming replication is asynchronous: the primary does not wait for a replica to indicate that it wrote the data. With synchronous replication, the primary will wait for any or all  replicas (based on synchronous replication mode) to confirm that they received and wrote the data.

Depending on your business requirements, you may only need  the default asynchronous behavior, or you may need to configure one or more synchronous replicas. Fortunately, PostgreSQL lets you choose and provides options for tuning the consistency and performance (latency) behavior depending on your requirements.

See the PostgreSQL documentation for more details on streaming replication.

Preparation

Name your instances. Life will be simpler. You do that by setting a configuration parameter in each instance's postgresql.conf file. We'll see why it makes things easier later using the pg_stat_replication table's contents on the primary. Since you use the name of a replica to configure it as synchronous, if each replica has a unique name, you can configure individual replicas as synchronous or asynchronous.

For example, on one of the replicas in postgresql.conf

(Note: If your cluster is managed or created by Patroni or Crunchy HA PostgreSQL, it will manage the contents of postgresql.conf, so make the changes in the Patroni config, which will generate the postgresql.conf file that is used by the servers in your cluster)

cluster_name = 'replica2' # added to process titles if nonempty

Creating a Replica

The first step in creating a replica is to clone the primary. There are several ways to do that, but the straightforward way is to use pg_basebackup, which clones a running primary PostgreSQL instance. For the simple test case for this article, to create two replicas, as user postgres

/usr/lib/postgresql/12/bin/pg_basebackup -Xs -D ~/12/replica1 -R -p 5433 -h localhost -U replicant 
/usr/lib/postgresql/12/bin/pg_basebackup -Xs -D ~/12/replica2 -R -p 5443 -h localhost -U replicant

These connect to the primary as a client as the user replicant which must exist as a user in the primary DB instance and have REPLICATION privileges.

Note the -Xs argument which will open a second connection to the primary to capture WAL changes to the DB as it operates normally. The -R argument will tell pg_basebackup to create the recovery configuration in the data directory for the new replica.

Other ways to clone a primary are:

  • pgBackrest to backup the primary and restore to the new replica location.
  • OS level backups or snapshots, but you must be certain that you get a consistent copy of the primary. A safe way to do this is to stop the instance first.

Example replica settings

(Note that synchronous replication is independent of replication slots. Either can be used with or without the other.)

On a replica:

Note that on the replica's configuration, including the recovery section, there is no indication that it's sync or async. Whether a replica is synchronous or async is determined by the primary's configuration.

This was generated by pg_basebackup to clone the primary and create a replica.

bash> cat /var/lib/postgresql/12/replica2/postgresql.auto.conf
 Do not edit this file manually!
  • It will be overwritten by the ALTER SYSTEM command.
  • On PostgreSQL versions prior to 12, this information is stored in the recovery.conf file
  • Recovery settings generated by pgBackRest restore on 2020-02-13 13:11:08
  • standby_mode on  # only on pg versions <12.  Replaced by the standby.signal file in pg12
recovery_target_timeline 'latest'
recovery_target_action = 'promote'
primary_conninfo = 'user=replication_user passfile=''/var/lib/postgresql/.pgpass'' port=5433 host=''localhost'' user=''replicant'''

Make a replica synchronous

Once you have streaming replication working, on the primary add a replica name to synchronous_standby_names in postgresql.conf:

synchronous_standby_names = 'replica2'

and tell the postgresql primary to reload the config (e.g. - pg_ctl reload -D $PGDATA or the method your HA support uses). Now the instance named replica2 is a synchronous replica. That's all. Really.

Changing a replica from synchronous back to the default asynchronous is similar. Just remove that replica's name from the list in synchronous_standby_names in the primary's config and tell the primary to reload its configuration.

Reviewing and checking the current replication cluster

Back on the primary:

 
  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 28488 |       10 | replicant | walreceiver      | 127.0.0.1   |                 |       43221 | 2020-08-25 08:25:22.658642-07 |              | streaming | 0/2E000060 | 0/2E000060 | 0/2E000060 | 0/2E000060 |           |           |            |             0 | async      | 2020-08-25 08:25:22.400688-07
 15936 |    24794 | replicant | replica2         | 127.0.0.1   |                 |       50772 | 2020-08-25 08:25:06.760228-07 |              | streaming | 0/2E000060 | 0/2E000060 | 0/2E000060 | 0/2E000060 |           |           |            |             0 | sync       | 2020-08-25 08:25:56.915357-07
(2 rows)
  • Note that we have a sync and async replica.
  • Note that one of the replicas has cluster_name unset/defaulting, so walreceiver is its name.
  • Recall this setting in postgresql.conf on the primary.
synchronous_standby_names = 'replica2'

Life will be much easier if the replicas don't all default to cluster_name unset, where all the replicas will have the default name walreceiver.

Changing a replica to/from synchronous

Changing a replica from synchronous to asynchronous, or vice versa, is easy. Just add the replica name to synchronous_standby_names in the primary's postgresql.conf and tell PostgreSQL to reload the configuration; no DB restart needed.

How synchronous is it? Waiting for storage.

How synchronous is it? The value of synchronous_commit on the primary determines this.

synchronous_commit = on  # the default

In order of increasing "safety" (durability), and increasing latency, values of synchronous_commit on the primary:

  • When we set synchronous_commit = off, a COMMIT does not wait for the transaction record to be flushed to disk.
  • When we set synchronous_commit = local, a COMMIT waits until the transaction record is flushed to the local disk.
  • When we set synchronous_commit = on, a COMMIT will wait until the server(s) specified by synchronous_standby_names confirm that the transaction record was safely written to disk.
    • Note: When synchronous_standby_names is empty, this setting behaves same as synchronous_commit = local.
  • When we set synchronous_commit = remote_write, a COMMIT will wait until the server(s) specified by synchronous_standby_names confirm write of the transaction record to the operating system but has not necessarily reached the disk on the replica.
  • When we set synchronous_commit = remote_apply, a COMMIT will wait until the server(s) specified by synchronous_standby_names confirm that the transaction record was applied to the replica's database.

How much data might not be replicated in a case where the replica loses connectivity with the primary, with the faster and less durable options? That depends on more PostgreSQL settings - wal_writer_delay and wal_writer_flush_after. The first one flushes WAL after a specified time period (200ms default), the second one flushes if the specified number of WAL files are created since the last flush. If you set synchronous_commit to off, then these two settings will limit how much WAL remains uncommitted.

Setting synchronous behavior in a session (client)

Since the synchronous commit behavior is related to a transaction, it can be changed by a client for the session and during a session, so a client can set different values for each statement. You can set the synchronous behavior at any of these levels:

  • Single statement / transaction - SET LOCAL synchronous_commit =
  • Session - SET synchronous_commit = `
  • User - ALTER USER someuser SET synchronous_commit =
  • Database - ALTER DATABASE SET synchronous_commit =
  • And of course cluster wide by updating postgreql.conf

Adding priority or quorum to the list of synchronous replicas

In addition to specifying how synchronous a remote replica, you can also create a list of synchronous replicas by priority - FIRST , or a quorum of replicas - ANY.

Quorum is an important aspect of distributed computing.  You may already know what it is, but if not, here’s a simplified explanation.  In this case we are concerned with consistency of the DB data across multiple DB nodes.  When a number of nodes - that you choose - all have the same data committed, then the cluster is considered to be in a consistent state.  The number of nodes can be all the nodes in the cluster, or it can be a subset of the nodes.  The number of nodes is the quorum number and is chosen depending on your business requirements for data consistency.  In this case, the nodes “vote” for quorum by replying to the primary that they have received and applied the replicated data.  It’s common to have an odd number of nodes in a cluster and then define quorum as the majority of nodes with consistent copies of the DB data (e.g. - 2 of 3 nodes or 3 of 5 nodes).  Quorum is used for other purposes in distributed computing.  A common case is electing a new primary from a cluster of nodes when the current primary fails or is unavailable.

There are more options in the primary's postgresql.conf setting of synchronous_standby_names to support priority and simple quorum.

For the priority case, the FIRST keyword:

synchronous_standby_names = ‘FIRST num (standby_name [, …])’

The synchronous commit will wait for a reply from at least num number of standbys listed in the order of priority. 

For the quorum case, the ANY keyword:

synchronous_standby_names = ‘ANY num (standby_name [, …])’

The same rules as above apply. So, for example setting synchronous_standby_names = 'ANY 2 (*)' will cause synchronous commit to wait for reply from any 2 standby servers. Double check your syntax and test that the settings implement your business rules for consistency.

Configure so you don't wait forever

With synchronous replication, you've built in a dependency that a transaction is not committed on the primary until it's written to the synchronous replica, so depending on the configuration options above, your primary can hang forever if the replica (or quorum of replicas) is not reachable by the primary. Obviously, you're dependent on the connection between primary and replicas.

If you have a single synchronous replica and it is unavailable, your primary will wait for it to return, and will block until it does. To avoid that you want to have at least two replicas and use the FIRST or ANY options to synchronous_standby_names described above. You could disable synchronous replication by commenting out synchronous_standby_names but then of course, you don't have a synchronous replica.

The TL;dr

To convert a streaming binary replica to synchronous, add its name to the primary's postgresql.conf setting synchronous_standby_names, and reload the primary.

To convert a synchronous replica to asynchronous, remove its name from the primary's postgresql.conf setting synchronous_standby_names, and reload the primary.

These will be much easier if you've added a unique name to the replica's cluster_name setting in its postgresql.conf

References

As always, the PostgreSQL documentation is the place to look for more information - https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Join the Discussion

Newsletter