Crunchy Data is pleased to release PostgreSQL Operator 4.0.
I had previously explored how to build an active-active PostgreSQL cluster using Symmetric-DS. The steps are essentially the same on Kubernetes:
- Start 2 PostgreSQL pods
- Create a user and a database on each pod/instance
- Start the primary symmetric-ds pod
- Add the symmetric-ds triggers and routes
- Open registration
- Start the secondary symmertic-ds pod
However, there are some interesting nuances I discovered while building out this setup, which I will discuss below. Before diving into the details, I'd also like to credit my colleagues Patrick McLaughlin and Simon Nielly as co-authors for helping me to put together this guide.
Anyway, let's explore how to deploy an active-active PostgreSQL cluster on Kubernetes.
Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 10 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5 and 9.6 while continuing to build upon Crunchy Data's efforts with the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG).
What is a CIS Benchmark?
As mentioned in an earlier blog post, a CIS Benchmark is a set of guidelines and best practices for securely configuring a target system. The benchmark contains a series of recommendations that help test the security of the system: some of the recommendations are "scored" (where a top score of 100 is the best), while others are are provided to establish best practices for security.
One of the most important things to using PostgreSQL successfully in your development and production environments is simply getting started! One of the most popular ways to install PostgreSQL is by using RPM packages. The PostgreSQL RPM packages work across many Linux distributions, including, RedHat Enterprise Linux (RHEL), CentOS, Fedora, Scientific Linux, and more, and the PostgreSQL community provides installers for these distributions.
This guide will help you get started with installing and configuring PostgreSQL for a CentOS / RHEL 7 based system, which will also work for Fedora 29. We will be installing PostgreSQL 11, which is the latest major release of PostgreSQL as of this writing.
Installing yum / dnf repository setup rpm
PostgreSQL provides a bevy of authentication methods to allow you to pick the one that makes the most sense for your environment. One desired implementation that I have found customers wanting is to use Windows Active Directory with PostgreSQL's GSSAPI authentication interface using Kerberos. I've put together this guide to help you take advantage of this setup in your own environment.
pgBackRest is an open source tool designed for efficiently performing backups and restores for PostgreSQL on up to terabytes of data at a time. Its aim is to be reliable, scalable, and flexible while offering robust performance during the backup process.
Many powerful features are included in pgBackRest, including parallel backup and restore, local or remote operation, full, incremental, and differential backup types, backup rotation, archive expiration, backup integrity, page checksums, backup resume, streaming compression and checksums, delta restore, and much more.
A difficult scenario which may present itself to database engineers is ensuring the high availability of your PostgreSQL cluster, which includes ensuring all backups are performing consistently and well without impacting the primary database host. Occasionally, you may experience a substantial system load on the primary database whenever a pgBackRest backup is running if you have a sizable enough amount of data.
An effective solution to this issue is to enable the option to enable the performance of backups from the replica rather than the primary; the bulk of the impact to I/O then occurs on the replica node and minimizes the load on the primary host. The only connections being made at that point would be from the replica to the primary in order to obtain information and log files.
It is additionally possible to configure more than one replica to work with pgBackRest; in this case, the first running standby found will be chosen and used for the backup process. If one of the standby nodes fail, pgBackRest will automatically find and proceed to run the backup on the next available replica that is configured.
Please note that the ability to define more than one standby was not available until version 1.23 of pgBackRest; in this version, multiple standby support was introduced that allows up to seven replicas to be configured rather than the limit of two from prior to that release. See the release notes here.
Additionally, if you are following this guide with the intent of configuring it for your current cluster and your version of pgBackRest is prior to version 2, please note that any configuration parameters beginning in “pg”* will need to be re-referenced as “db”*.
Specific documentation from pgBackRest’s official website for configuring the backup from standby option in version 1 can be found here.
In the following guide, we will explore configuring pgBackRest on two separate CentOS 7 servers and enabling the ability to backup PostgreSQL from the replica.
Many applications today record data from sensors, devices, tracking information, and other things that share a common attribute: a timestamp that is always increasing. This timestamp is very valuable, as it serves as the basis for types of lookups, analytical queries, and more.
New features and better performance get a lot of attention, but one of the relatively unsung improvements in PostGIS over the past ten years has been inclusion in standard software repositories, making installation of this fairly complex extension a "one click" affair.
Once you've got PostgreSQL/PostGIS installed though, how are upgrades handled? The key is having the right versions in place, at the right time, for the right scenario and knowing a little bit about how PostGIS works.
Common table expressions, aka CTEs, aka WITH queries, are not only the gateway to writing recursive SQL queries, but also help developers write maintainable SQL. WITH query clauses can help developers who are more comfortable writing in imperative languages to feel more comfortable writing SQL, as well as help reduce writing redundant code by reusing a particular common table expressions multiple times in a query.
A new patch, scheduled to be a part of PostgreSQL 12 major release later in the year, introduces the ability, under certain conditions, to inline common table expressions within a query. This is a huge feature: many developers could suddenly see their existing queries speed up significantly, and the ability to explicitly specify when to inline (i.e. the planner "substitutes" a reference to the CTE in the main query and can then optimize further) or, conversely, materialize (i.e. place the CTE into memory but lose out on certain planning & execution optimizations).
But why is this a big deal? Before we look into the future, first let's understand how WITH queries currently work in PostgreSQL.