Blog | Crunchy Data

Enhancing Your PostgreSQL 10 Security with the CIS Benchmark

 Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 10 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5 and 9.6 while continuing to build upon Crunchy Data's efforts with the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG).

What is a CIS Benchmark?

As mentioned in an earlier blog post, a CIS Benchmark is a set of guidelines and best practices for securely configuring a target system.  The benchmark contains a series of recommendations that help test the security of the system: some of the recommendations are "scored" (where a top score of 100 is the best), while others are are provided to establish best practices for security.

PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

March 20, 2019

PostgreSQL DBA rpm Install

One of the most important things to using PostgreSQL successfully in your development and production environments is simply getting started! One of the most popular ways to install PostgreSQL is by using RPM packages. The PostgreSQL RPM packages work across many Linux distributions, including, RedHat Enterprise Linux (RHEL), CentOS, Fedora, Scientific Linux, and more, and the PostgreSQL community provides installers for these distributions.

This guide will help you get started with installing and configuring PostgreSQL for a CentOS / RHEL 7 based system, which will also work for Fedora 29. We will be installing PostgreSQL 11, which is the latest major release of PostgreSQL as of this writing.

Installation

Installing yum / dnf repository setup rpm

How to setup Windows Active Directory with PostgreSQL GSSAPI Kerberos Authentication

March 04, 2019

PostgreSQL security

PostgreSQL provides a bevy of authentication methods to allow you to pick the one that makes the most sense for your environment. One desired implementation that I have found customers wanting is to use  Windows Active Directory with PostgreSQL's GSSAPI authentication interface using Kerberos. I've put together this guide to help you take advantage of this setup in your own environment. 

pgBackRest - Performing Backups on a Standby Cluster

February 27, 2019

PostgreSQL pgbackrest

pgBackRest is an open source tool designed for efficiently performing backups and restores for PostgreSQL on up to terabytes of data at a time. Its aim is to be reliable, scalable, and flexible while offering robust performance during the backup process.

Many powerful features are included in pgBackRest, including parallel backup and restore, local or remote operation, full, incremental, and differential backup types, backup rotation, archive expiration, backup integrity, page checksums, backup resume, streaming compression and checksums, delta restore, and much more.

A difficult scenario which may present itself to database engineers is ensuring the high availability of your PostgreSQL cluster, which includes ensuring all backups are performing consistently and well without impacting the primary database host. Occasionally, you may experience a substantial system load on the primary database whenever a pgBackRest backup is running if you have a sizable enough amount of data.

An effective solution to this issue is to enable the option to enable the performance of backups from the replica rather than the primary; the bulk of the impact to I/O then occurs on the replica node and minimizes the load on the primary host. The only connections being made at that point would be from the replica to the primary in order to obtain information and log files.

It is additionally possible to configure more than one replica to work with pgBackRest; in this case, the first running standby found will be chosen and used for the backup process. If one of the standby nodes fail, pgBackRest will automatically find and proceed to run the backup on the next available replica that is configured. 

Please note that the ability to define more than one standby was not available until version 1.23 of pgBackRest; in this version, multiple standby support was introduced that allows up to seven replicas to be configured rather than the limit of two from prior to that release. See the release notes here.

Additionally, if you are following this guide with the intent of configuring it for your current cluster and your version of pgBackRest is prior to version 2, please note that any configuration parameters beginning in “pg”* will need to be re-referenced as “db”*.

Specific documentation from pgBackRest’s official website for configuring the backup from standby option in version 1 can be found here.

In the following guide, we will explore configuring pgBackRest on two separate CentOS 7 servers and enabling the ability to backup PostgreSQL from the replica.

PostgreSQL BRIN Indexes: Big Data Performance With Minimal Storage

February 25, 2019

PostgreSQL

Many applications today record data from sensors, devices, tracking information, and other things that share a common attribute: a timestamp that is always increasing. This timestamp is very valuable, as it serves as the basis for types of lookups, analytical queries, and more.

Upgrading PostGIS on Centos 7

February 20, 2019

PostgreSQL postgis

New features and better performance get a lot of attention, but one of the relatively unsung improvements in PostGIS over the past ten years has been inclusion in standard software repositories, making installation of this fairly complex extension a "one click" affair.

Once you've got PostgreSQL/PostGIS installed though, how are upgrades handled? The key is having the right versions in place, at the right time, for the right scenario and knowing a little bit about how PostGIS works.

WITH Queries: Present & Future

Common table expressions, aka CTEs, aka WITH queries, are not only the gateway to writing recursive SQL queries, but also help developers write maintainable SQL. WITH query clauses can help developers who are more comfortable writing in imperative languages to feel more comfortable writing SQL, as well as help reduce writing redundant code by reusing a particular common table expressions multiple times in a query.

A new patch, scheduled to be a part of PostgreSQL 12 major release later in the year, introduces the ability, under certain conditions, to inline common table expressions within a query. This is a huge feature: many developers could suddenly see their existing queries speed up significantly, and the ability to explicitly specify when to inline (i.e. the planner "substitutes" a reference to the CTE in the main query and can then optimize further) or, conversely, materialize (i.e. place the CTE into memory but lose out on certain planning & execution optimizations).

But why is this a big deal? Before we look into the future, first let's understand how WITH queries currently work in PostgreSQL.

PostgreSQL Deep Dive: How Your Data Model Affects Storage

February 13, 2019

PostgreSQL

I want to take a few minutes for a deep dive into the effect your data model has on storage density when using PostgreSQL. When this topic came up with a customer, I explained my thoughts on the matter, but I realized at the time that I had never done a reasonably careful apples-to-apples test to see just exactly what the effect is, at least for a model sample size of one. So here it is.

A Guide to Building an Active-Active PostgreSQL Cluster

One of the toughest challenges facing database engineers today is ensuring their data is always accessible so they can meet the high-availability  requirements for their applications.

While this problem may seem trivial in the world of applications where one can have many instances behind geographically distributed load balancers, in the database world where there is only one version of the data globally, the problem faces many obstacles.

PostgreSQL replication has advanced considerably in recent major releases, including continuous improvements to streaming replication and the addition of logical replication in PostgreSQL 10. While these capabilities represent important enhancements to PostgreSQL, enabling users to address a wide variety of clustered architectures, they do not (easily) address the use cases where the application requires access to an updatable database in two or more geographic regions - often referred to as an "active-active" cluster.

More specifically, an active-active cluster is one where the application can write to any instance in the cluster and the data will be written to all of the instances in the cluster, enabling each instance in the cluster to be used to:


  • Provide near zero downtime as the new instance is already in a read/write state; there is no need to reconfigure it.
  • Provide near zero downtime upgrades from one version to another
  • Improve latency for users in geographically distributed clusters. By providing an instance physically closer to the user latency is reduced.

While there are a number of proprietary solutions that attempt to address active-active PostgreSQL requirements, this post and a series to follow provides users with potential reference architectures and configurations that enable active-active PostgreSQL configurations using entirely open source software.

This post is of course only one approach to deploying an active-active PostgreSQL cluster.  There are other ways to deploy an active-active setup with PostgreSQL.  I will cover some ways to do this in the future - stay tuned!

What's New in Crunchy PostgreSQL Operator 3.5

Crunchy Data is happy to announce the release of the open source  PostgreSQL Operator 3.5 for Kubernetes project, which you can find here: https://github.com/CrunchyData/postgres-operator/

This latest release provides further feature enhancements designed to support users intending to deploy large-scale PostgreSQL clusters on Kubernetes, with enterprise high-availability and disaster recovery requirements.

When combined with the Crunchy PostgreSQL Container Suite, the PostgreSQL Operator provides an open source, Kubernetes-native PostgreSQL-as-a-Service capability.

Read on to see what is new in PostgreSQL Operator 3.5.