Infrastructure

Running PostgreSQL in production  the boring version

Replication, PITR, vacuum, observability — the unglamorous work that keeps your database boring.

Back to insights
14 min min read

Most of the work of running PostgreSQL in production is not visible from the outside. It is replication lag you never notice, vacuum that runs at the right time, and indexes that match the actual queries.

Replication is the first thing we set up — not because we expect a primary to fail, but because we expect to want a recent read replica for analytics, backups, and the occasional rollback test. A streaming replica with a small lag is also the cheapest insurance policy against an accidental DELETE.

Point-in-time recovery should be tested, not assumed. We run a quarterly restore drill: take the latest WAL archive, restore to a fresh instance, point a smoke test at it. If the drill takes longer than the SLA, the SLA is fiction.

Vacuum is often the source of mysterious latency spikes. The default settings are fine until they aren't. Watching `pg_stat_user_tables` and tuning per-table thresholds is one of the higher-leverage operational changes you can make.

None of this is glamorous. That is the point — a database in production should be boring, and boring is the result of work nobody else sees.

Engineering the Future of Digital Infrastructure

Talk to an engineer