Blog | Data Data

Using the CIS PostgreSQL Benchmark to Enhance Your Security

August 01, 2018

PostgreSQL security

Crunchy Data recently announced the publication of the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This CIS PostgreSQL Benchmark builds on earlier work that Crunchy started when it helped to publish the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG) and provides guidance and steps to help secure your PostgreSQL databases.

What is a CIS Benchmark?

A CIS Benchmark is a set of guidelines and best practices for securely configuring a target system. Authoring a CIS Benchmark is a collaborative process as CIS involves considerable peer reviews and discussion before a major version is published, to ensure there is a general consensus on the best practices for deploying a secure system.

Using Kubernetes Deployments for Running PostgreSQL

Running PostgreSQL databases in containerized environments is more popular than ever and is moving beyond running only in local, development environments and into large scale production environments. To answer the need to orchestrate complex, database workloads, the Crunchy Data team created the PostgreSQL Operator to automate many typical database administrator tasks at scale:

 

  • Provisioning new PostgreSQL clusters
  • Scaling up replicas
  • Setup and manage disaster recovery, high-availability, and monitoring
  • Allocate resources (memory, CPU, etc.) and suggest nodes for databases to run on
  • Mass applying user policies
  • Performing major/minor upgrades

and more. In order to keep all of its PostgreSQL databases up and running, the PostgreSQL Operator uses Kubernetes Deployments, which provides an API to manage replicated applications. In order to understand why this is, first we need to understand about running stateful applications with Kubernetes.

Troubleshooting PostgreSQL Streaming Replication Synchronization

Having replicas in a database environment can fulfill many goals: create a high-availability environment, load balance read-only queries across several nodes, have a read-only database to run analytical queries on, and more. PostgreSQL introduced streaming replication in PostgreSQL 9.0 almost 8 years ago to help facilitate satisfying the above requirements, and through the years, the community has added many features to continue to enhance the replication user experience.

There are many different ways to configure streaming replication in PostgreSQL, but fundamentally you need to consider if you want to have asynchronous or synchronous replication. With asynchronous replication, the default in PostgreSQL, a transaction is considered complete when it is committed on the primary database; the replica databases will replay and consume the transaction on their own time. With synchronous replication, the transaction will not be considered complete until it has been committed on all of the replicas (though PostgreSQL 10 introduced “quorum commit” which allows you to configure how many replicas need to commit the transaction before it’s considered complete).

Choosing which form of replication you need for your environment is out-of-scope for this article. Rather, we will look at what happens when a replica becomes out-of-sync with a primary, what to do, and how you can avoid this situation. Credits to Doug Hunley for originally compiling this guide.

Detecting An Out-of-Sync Replica

Upgrading PostgreSQL from 9.4 to 10.3 with pglogical

April 22, 2018

Managing PostgreSQL

I recently helped a customer upgrade a PostgreSQL instance from 9.4 on RHEL to 10.x on Ubuntu. While it initially sounded daunting, the use of pglogical and some planning actually made it rather straightforward. While there’s nothing new or original in this post, I still felt compelled to write it up for anyone else that might find the info useful as an example in their own endeavors.

pglogical is a logical replication system implemented entirely as a PostgreSQL extension. Fully integrated, it requires no triggers or external programs. This makes it faster than Slony, Londiste, et al. It is also (roughly) the basis upon which logical replication in PostgreSQL 10 core is built.

How to Perform Failover with the PostgreSQL Kubernetes Operator

Crunchy Data recently released version 2.6 of the PostgreSQL Kubernetes Operator, a powerful controller that follows the Kubernetes Operator pattern that enables users to provision and manage thousands of PostgreSQL databases at scale. We thought that we would demonstrate some of the features in this latest version over the next few weeks, which includes support for manual database failover, storage selection, node (or server) selection preference, and many goodies that make it easier to manage a large PostgreSQL cluster.

Fast CSV and JSON Ingestion in PostgreSQL with COPY

April 03, 2018

PostgreSQL

If you have been asked to provide a CSV that someone can open up in their favorite spreadsheet editor, chances are you have used the PostgreSQL COPY command. COPY has been around since the early open source releases of PostgreSQL back in the late 1990s and was designed to quickly get data in and out of PostgreSQL.

COPY is also incredibly helpful for ingesting data into a table, especially if you have a lot of it to ingest, and will generally outperform INSERT. Let’s explore a few ways to use COPY to load some data into a table.

 

An Easy Recipe for Creating a PostgreSQL Cluster with Docker Swarm

One of the biggest benefits of running PostgreSQL is running your cluster in primary-replica setup for the purposes of high-availability or load balancing your read-only queries.  It is not necessarily simple to deploy a primary-replica setup out of the box, but by using modern containerization technology, we can greatly simplify the process.

In this article, I will demonstrate how to easily setup and deploy a PostgreSQL primary-replica cluster using Docker and Docker Swarm.

Demystifying Schemas & search_path through Examples

March 02, 2018

PostgreSQL security

On March 1, 2018, the PostgreSQL community released version 10.3 and other supported versions of PostgreSQL.  The release centered around a disclosed security vulnerability designated CVE-2018-1058, which is related to how a user can accidentally or maliciously "create like-named objects in different schemas that can change the behavior of other users' queries."

The PostgreSQL community released a guide around what exactly CVE-2018-1058 is and how to protect your databases. However, we thought it would also be helpful to look into what schemas are in PostgreSQL, how they are used under normal operations, and how to investigate your schemas to look for and eliminate suspicious functions.

Range Types & Recursion: How to Search Availability with PostgreSQL

February 22, 2018

PostgreSQL

One of the many reasons that PostgreSQL is fun to develop with is its robust collection of data types, such as the range type. Range types were introduced in PostgreSQL 9.2 with out-of-the-box support for numeric (integers, numerics) and temporal ranges (dates, timestamps), with infrastructure in place to create ranges of other data types (e.g. inet/cidr type ranges). Range data is found in many applications, from science to finance, and being able to efficiently compare ranges in PostgreSQL can take the onus off of applications workloads.

A PostgreSQL Row Level Security Primer + Creating Large Policies

February 12, 2018

PostgreSQL security

Row Level Security, aka "RLS," allows a database administrator to define if a user should be able to view or manipulate specific rows of data within a table according to a policy. Introduced in PostgreSQL 9.5, row level security added another layer of security for PostgreSQL users who have additional security and compliance considerations for their applications.

At Crunchy Data, we care a lot about data security and supporting PostgreSQL. When we discovered an issue with creating a large row level security policy, we worked to find a workaround that provided a secure outcome but avoided creating a custom fork of PostgreSQL.  As of the publication of this post, the Crunchy Data team is working to address the issue with a patch that will be submitted for inclusion upstream.

To understand the issue and how our team found a solution, I'll first give a quick primer of how row level security works in PostgreSQL.