Kubernetes Pod Tolerations and Postgres Deployment Strategies

Jonathan S. Katz
Kubernetes High-Availability PostgreSQL Operator

The desire to use Pod tolerations to schedule Postgres instances sometimes comes up around complex Kubernetes deployments. To address this feedback, we added support for tolerations to the 4.6 release of the Postgres Operator along with improvements to using node affinity.

To use tolerations with PostgreSQL deployments, it helps to understand some of the mechanics behind several Kubernetes features to get the desired result of deploying PostgreSQL to a specific node group.

Let's take a look at how we can use Pod tolerations with the PostgreSQL Operator to create different production topologies. First, let's cover how Kubernetes taints, tolerations, and node affinity can work together.

Node affinity, taints, and tolerations

One of Kubernetes' primary jobs is to schedule Pods, the fundamental units of execution, to nodes.Kubernetes will perform Pod scheduling without any guidance, though its interface can give it hints on how and where to schedule Pods.

Node affinity (which the PostgreSQL Operator has supported for quite some time)  provides Kubernetes guidance for how it can schedule a Pod. In the context of the Postgres Operator, there are two types of node affinity that it supports:


  • required: Kubernetes must schedule a Pod to a node that matches the node affinity rule. If it cannot schedule the Pod to that node, it must not schedule it at all.
  • preferred: Kubernetes should try to schedule a Pod to a node that matches the node affinity rule. If it cannot, it should attempt to schedule it elsewhere.

This distinction is important. Often people wonder why a Pod is not scheduled to a particular node even though node affinity is set. The likely culprit is that the node affinity has a preferred rule instead of required.

While node affinity rules tell Kubernetes to "try to schedule this Pod here," node taints do the opposite. A node taint tells the Kubernetes scheduler that a node is "off limits" unless it meets certain conditions. In other words, a Taint allows you to create a "lock" one or more nodes so that Kubernetes can only schedule Pods to them that have a particular "key." You can read more about tainting nodes in the Kubernetes documentation.

Pod tolerations provide the "keys" to allowing Kubernetes to schedule Pods to tainted nodes. If a Pod has a Toleration that matches the Taint of a node, then Kubernetes knows it can schedule the Pod to that node.

Note that just because a Pod has a matching toleration for a node does not mean that Kubernetes will schedule the Pod to that node. tolerations only allow give permission for the scheduling of Pods to Tainted nodes. node affinity provides Kubernetes guidance on where to actually schedule the Pods.

With these concepts, let's look at how we can use Pod tolerations to schedule Postgres clusters to tainted nodes.

Deploying PostgreSQL with Pod tolerations

The Postgres Operator supports two ways of managing tolerations for a PostgreSQL cluster: through the `pgo` client or through a GitOps workflow using a custom resource. For the example below, we will use the `pgo` client.

The pgo create cluster, pgo update cluster, and pgo scale commands support the --toleration flag, which allows for the addition of one or more tolerations to a PostgreSQL cluster. Values accepted by the --toleration use the following format:

rule:Effect

Where a rule can represent existence (e.g. key) or equality (key=value) and Effect is one of NoSchedule, PreferNoSchedule, or NoExecute.

For example, to add two tolerations to a new PostgreSQL cluster, one that is an existence toleration for a key of ssd and the other that is an equality toleration for a key/value pair of db/01, you can run the following command:

pgo create cluster hippo \
  --toleration=ssd:NoSchedule \
  --toleration=db=01:NoSchedule

Now let's say you have a group of nodes with the Taint db/02 that you are reserving for replicas. You can add a replica to the  hippo cluster with a toleration for db/02 with the following command:

pgo scale hippo --toleration=db=02:NoSchedule

If you want to update tolerations on an existing cluster, you can do so by either modifying the pgclusters.crunchydata.com and pgreplicas.crunchydata.com custom resources directly or with the pgo update cluster command. pgo update cluster can also remove a toleration if it detects a - at the end of the toleration effect.

For example, to add a toleration of nvme:NoSchedule and remove the toleration of ssd:NoSchedule, you could run the following command:

 

pgo update cluster hippo \
  --toleration=nvme:NoSchedule \
  --toleration=ssd:NoSchedule-

The PostgreSQL Operator will roll out any changes to the appropriate instances.

Mixing in node affinity

Now, even though you may have given your Postgres cluster the "keys" for deployment to nodes with specific taints, Kubernetes may not actually schedule them there. tolerations only give you permission to deploy. Node affinity gives Kubernetes rules on where to actually deploy Pods.

Using the previous example, let's say that we want to deploy our hippo Postgres cluster to two different node groups: one with node label db=01 and one with node label db=02. Note that while these have the same names as the taints, node labels are not the same as taints. This is to illustrate how to use node affinity to guide Kubernetes to deploy our PostgreSQL instances.

We want to force Kubernetes to deploy each Postgres instance to the specific nodes. We can use the --node-affinity-type flag to make Kubernetes to build out our deployment topology:

pgo create cluster hippo \
  --toleration=ssd:NoSchedule \
  --toleration=db=01:NoSchedule \
  --node-affinity=db=01 \
  --node-affinity-type=required

pgo scale cluster hippo \
  --toleration=ssd:NoSchedule \
  --toleration=db=02:NoSchedule \
  --node-affinity=db=02 \
  --node-affinity-type=required

Conclusion

Kubernetes tolerations and node affinity, coupled with the Postgres Operator, are a powerful combination for creating sophisticated deployments strategies for production PostgreSQL clusters. You should make sure you understand how these tools can affect high availability when designing a production environment for your data.

Pod tolerations do allow for your PostgreSQL instances to take advantage of hardware that you want to reserve for your databases and help you to leverage the power of Kubernetes for your Postgres deployments.

Newsletter