If you're an application developer, analyst, data scientist, or anyone who's had to figure out how to work with relational databases, chances are you're familiar with indexes. At least to the extent that you know they somehow help speed up your queries. (That's where I'd left my understanding of indexes for a good amount of time). You may have also seen that PostgreSQL automatically creates an index for a column with a primary key or unique constraint. But start to dig just a little bit deeper and you'll quickly realize that there's a lot more to it than just making searches go faster!
If you're getting started with learning about indexes, here are a few things that hopefully will help round out your understanding.
1. Indexes can speed up other operations too
We typically think of indexes as used for queries that look like this:
SELECT * FROM employees WHERE department = 'Accounting';
The database can look for records that meet the filter criteria (
WHERE department = 'Accounting') by scanning
employees' rows one by one. You can imagine how checking each row one after another can end up being pretty slow especially for larger tables. Indexes help speed up the search because providing pointers to the table rows where the data values meet the filter criteria.
But in addition to that, indexes might also help reduce the time taken for:
The simple explanation is that if there is an index on a column being used in the join condition (
SELECT * FROM employees INNER JOIN department ON employees.department = department.name), it might help the database find matching rows more quickly. This does depend on how the database decides to do the join - this blog series on SQL joins goes into a little more detail on join algorithms.
- Sorting and grouping (ORDER BY and GROUP BY):
There are many different types of database indexes (check out the official docs for PostgreSQL index types), but broadly speaking, they represent the indexed data in some ordered structure. If the database is able to use an index to retrieve table rows in a particular order, this could help reduce the time taken to meet the ORDER BY or GROUP BY clauses in the query.
2. Indexes aren't always used
Creating an index doesn't guarantee that a query will actually use it - and this can be influenced by a number of different factors.
For one, if the number of rows that meet the criteria is large enough, the query plan could skip the intermediate step of checking the index, and go directly to reading the table. Also, the type of query itself can matter. A query that uses wildcards, e.g.
… WHERE name LIKE 'Ma%', could take advantage of a B-tree index (the "default" and most common Postgres index type), but you might need to specify an operator class for the index to be effective.
This blog post on why indexes may not be used on depesz.com was published some time ago, but it's still a pretty interesting exploration and a fun read.
3. Indexes come at a cost
Indexes are separate data structures that are also stored on disk, so unfortunately we can't act as if they don't take up space like data tables do. They are, after all, just like the index of a book:
This is one of my old SQL textbooks and the index takes up 16 of 430 numbered pages. But I sure am glad the book includes it!
If an indexed column has to have new values inserted, or existing values updated or deleted, the corresponding index does get updated as well. Ironically, that might make queries take more time to evaluate. If write operations are frequently run against a particular column then you may need to more carefully evaluate creating an index here.
If there's only one thing you take away from this blog post, I hope it's this: success with indexes will involve some planning, investigation, and maintenance!
By now you'd have already realized that creating indexes in every possible place would be a terrible strategy (seriously, don't do it). You probably wouldn't want to treat indexes as a "set it and forget it" thing either. So, all of this does eventually come back as a cost, whether by way of time, resources, or having to hire expertise to help you optimize the way your database runs.
A couple of tools to help you with your indexing strategy are: EXPLAIN (especially EXPLAIN ANALYZE), and monitoring index statistics. At the very least these can help you gather information on whether your queries can benefit from indexes, or if your indexes are indeed helping as intended.
Ready for more on indexes?
Database indexes (and tuning and optimization in general!) is a pretty hefty topic, but at Crunchy Data we've put together a few resources that can help even if you are brand new to Postgres:
- In our Learning Portal, get started with a quick intro class on indexes,
- Next, check out Index Types In PostgreSQL to learn more about index types in addition to B-tree,
- Also try out our quick hands-on class on using EXPLAIN.
If you're looking for more advanced reading, I'd recommend the following posts from our Crunchy blog:
- Why Covering Indexes Are Incredibly Helpful by Jonathan Katz
- PostgreSQL BRIN Indexes: Big Data Performance With Minimal Storage by Jonathan Katz
- Avoiding the Pitfalls of BRIN Indexes in Postgres by John Porvaznik