One of the most critical topics to understand when administering a PostgresSQL database is the concept of transaction IDs (TXID) and that they can be exhausted if not monitored properly. However, this blog post isn't going to go into the details of what it TXID exhaustion actually is. The Routine Vacuuming section of the documentation is probably one of the most important to read and understand so I will refer you there. What this blog post is going to cover is an easy way to monitor for it and what can be done to prevent it ever being a problem.
I’ve been noticing that the query used in v1.x of my pg_bloat_check.py script (obtained from the check_postgres.pl module) was not always accurate and was often not reporting on bloat that I knew for a fact was there (Ex: I just deleted over 300 million rows, vacuumed & analyzed the table and still no bloat? Sure it could happen, but highly unlikely). So I continued looking around and discovered the pgstattuple contrib module that comes with PostgreSQL. After discussing it with several of the core developers at recent PostgreSQL conferences (PGConfUS & PGCon) I believe this is a much, much better way to get an accurate assessment of the bloat situation. This encouraged me to do a near complete rewrite of my script and v2.0.0 is now available. It’s not a drop-in replacement for v1.x, so please check the –help for new options.
pgstattuple is a very simple, but powerful extension. It doesn’t require any additional libraries to be loaded and just adds a few functions you can call on database objects to get some statistics about them. The key function for bloat being the default one, pgstattuple(regclass), which returns information about live & dead tuples and free space contained in the given object. If you read the description below on what bloat actually is, you’ll see that those data points are exactly what we’re looking for. The difference between what this function is doing and what the check_postgres.pl query is doing is quite significant, though. The check_postgres query is doing its best to guess what is dead & free space based on the current statistics in the system catalogs. pgstattuple actually goes through and does a full scan on the given table or index to see what the actual situation is. This does mean this query can be very, very slow on large tables. The database I got the examples below from is 1.2TB and a full bloat check on it takes just under 1 hour. But with the inaccuracies I’ve seen being returned by the simpler query, this time can be well worth it. The script stores the statistics gathered in a table so they can be easily reviewed at any time and even used for monitoring purposes, just like check_postgres.