field note / Operations

A Postgres backup is not real until you restore it

A field note from a simple offline Postgres migration: why pg_dump was enough, where read-only mode fits, and why the restore drill matters more than the dump command.

We had to move a Postgres database from one place to another.

This was a planned offline migration. The database was manageable, the product could tolerate a short maintenance window, and we did not need to keep the old and new databases live at the same time.

So we kept the process simple: create a dump, restore it into the new database, check the restored data, and point the app to the new database.

The backup file was not the success condition. The useful part was being able to restore it and confirm that the restored database was correct.

That is the lesson I want to keep from this migration:

A backup is not real until restore has been tested.

The shape of the migration

This was an offline migration.

That means we were allowed to stop writes for a short period, move the database, and bring the app back on the new database. Because of that constraint, the runbook could stay small:

  1. Pause writes
  2. Create the backup
  3. Restore the backup into the target database
  4. Verify the restored database
  5. Switch the app to the target database
  6. Resume writes

The key step here is pause writes.

Most of the complexity in database migrations comes from data changing while you are trying to move it. If users are still creating rows, background workers are still updating records, and webhooks are still being processed, then the target database is already stale by the time the first copy finishes.

For larger or stricter systems, teams solve that with logical replication, change data capture, dual writes, or managed migration services. Those are valid tools, but they introduce their own failure modes: lag, ordering, retries, reconciliation, and rollback complexity.

We did not need that here. We could accept a small write-freeze window, which made the technical plan much simpler.

Why pg_dump was enough

For this migration, pg_dump fit the job well.

pg_dump creates a logical backup. Instead of copying Postgres data files directly, it exports the database as schema and data that can be restored into another Postgres database. That makes it a good fit for a straightforward move between environments.

It is not the only kind of Postgres backup, and it is not always the best one. If I needed point-in-time recovery, I would be thinking about WAL archiving and PITR. If I had a very large production database with tight recovery time expectations, I would look at physical backups, snapshots, pgBackRest, Barman, or managed cloud backup features.

But that was not this situation.

For that, a logical dump was enough.

Choosing the dump format

pg_dump has a few output formats. The format affects how the backup is inspected and restored.

The most readable option is a plain SQL file:

pg_dump -F p -d <DATABASE> > backup.sql

That gives you a file you can open, search, and restore with psql. It works well for small databases, debugging, and cases where human readability matters. But it is not the most flexible format for restore drills.

For this kind of migration, I prefer the custom format:

pg_dump \
  -h <HOST> \
  -p <PORT> \
  -U <USER> \
  -d <DATABASE> \
  -F c \
  -f backup_YYYYMMDD_HHMMSS.dump

The custom format is still portable, but it is meant to be used with pg_restore. That gives you a few useful options: you can list the archive, restore into a target database, use restore flags, and inspect what the dump contains before trusting it.

The first quick check is:

pg_restore -l backup_YYYYMMDD_HHMMSS.dump

This does not prove the backup is good. It only proves the archive is readable and has the objects you expect. That is a useful first check before doing a full restore.

For larger logical backups, the directory format is also worth knowing because it supports parallel dump and restore:

pg_dump -F d -j 4 -f backup_dir <DATABASE>

Use directory format when size or restore speed makes it necessary. For a normal offline migration, custom format is a good default.

Before the backup command

Do a few checks before running the backup command.

I want to know where the dump will be written, whether the machine has enough disk space, whether the credentials work, what Postgres version the source is running, and whether the target database has the extensions and roles the restore will need.

Name the backup clearly. Include the database name and timestamp, and store it in a location the team can find later.

After creating the dump, I usually want at least three quick checks:

ls -lh backup_YYYYMMDD_HHMMSS.dump
pg_restore -l backup_YYYYMMDD_HHMMSS.dump
shasum -a 256 backup_YYYYMMDD_HHMMSS.dump

At this point, we have a backup file. The next step is to restore it.

Restore is the actual test

A backup that has never been restored has not been proven yet.

The restore should happen into a separate target database. Not the source. Not a place where failure creates a second problem. A clean target gives you a safe place to read errors and rerun the drill if needed.

For a custom-format dump, the restore looks roughly like this:

pg_restore \
  -h <TARGET_HOST> \
  -p <TARGET_PORT> \
  -U <TARGET_USER> \
  -d <TARGET_DATABASE> \
  --clean \
  --if-exists \
  backup_YYYYMMDD_HHMMSS.dump

The exact flags depend on the target. Sometimes the database is empty and --clean is unnecessary. Sometimes ownership and roles need special handling. Sometimes extensions need to exist before restore. These details are precisely why the restore should be practiced before the real cutover.

The restore command finishing successfully is a good sign. It is not the end of the drill.

What I want to check after restore

After restore, I want to check the database the way the application will experience it.

Not every table needs a full audit. But the important parts of the product should be checked. The core tables should exist. The row counts should be in the expected range. Recent data should be present. Sequences should not look obviously wrong. The app should be able to connect and run a smoke test.

A few simple checks can catch common issues:

SELECT count(*) FROM users;
SELECT count(*) FROM orders;
SELECT max(created_at) FROM orders;

The exact queries should come from the domain. For one app, the most important check might be the latest payment. For another, it might be the latest uploaded video, job record, organization, API key, or invoice.

The goal is to catch common restore problems before users do:

  • the restore missed data
  • a critical table is empty
  • an extension is missing
  • the app cannot connect
  • the latest records are not there
  • writes on the target fail after cutover

A restore that completes is not automatically a restore you should trust.

Where read-only mode actually fits

Read-only mode is not here because pg_dump always requires it. Postgres can take a consistent logical dump while the database is online. If all you need is a normal dump, making the whole database read-only is not automatically necessary.

Read-only mode matters in this story because this was a migration.

Near the end of a migration, you need a clear cutoff point. After that point, the old database should not change anymore. That makes the final backup or final sync meaningful. It also prevents a case where the app has moved to the new database but an old worker is still writing into the old one.

In practice, I would not start with the database switch. I would start with the application:

  • pause workers
  • stop cron jobs
  • stop consumers that write
  • block mutating API routes
  • confirm write traffic has stopped

That is the real write freeze.

After that, database-level read-only can be used as extra protection. For example:

ALTER DATABASE mydb SET default_transaction_read_only = on;

That affects new sessions. If old sessions are still connected, they may need to be terminated so they reconnect with the new default:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'mydb'
  AND pid <> pg_backend_pid();

In stricter setups, you might also revoke write permissions from the application role. But the main idea is simpler than the SQL: freeze writes where they are created, then use the database to enforce the boundary.

Read-only mode is a cutover control. It is not the backup strategy by itself.

The final cutover

Once the restore has been tested and the write-freeze plan is clear, the cutover becomes straightforward.

For this kind of offline migration, the sequence I want written down is:

  1. Announce the maintenance window
  2. Pause workers and scheduled jobs
  3. Block writes in the application
  4. Confirm the source database is quiet
  5. Optionally enforce source read-only
  6. Take the final dump
  7. Restore into the target database
  8. Run verification checks
  9. Point the app at the target database
  10. Start the app
  11. Run smoke tests
  12. Resume writes
  13. Watch logs and metrics

The old database should remain untouched for a while. Do not immediately delete it. Do not keep writing to it. Treat it as the last known source snapshot until the new database has survived real traffic.

Write down the rollback plan before cutover. Before writes resume on the new database, rollback may be as simple as pointing the app back to the old one. After writes resume, rollback becomes harder because now the new database has data the old one does not.

That is another reason the write-freeze and smoke-test window matters.

What this approach does not cover

This note is intentionally about the simple case.

If the app cannot tolerate write downtime, this is not enough. You probably need logical replication, CDC, a managed migration service, or an application-level migration pattern. If the database is huge, a single logical dump may be too slow. If the recovery requirement is “restore to the minute before the bad deploy,” then nightly dumps are not enough and PITR should be part of the backup design.

For our case, the simple approach matched the constraints. The database was manageable, and the downtime was acceptable.

The migration did not need extra complexity. It needed to be practiced.

The lesson

The command is not the hard part.

This command creates a backup:

pg_dump -F c -f backup.dump <DATABASE>

But the migration only becomes trustworthy after the rest of the drill:

  1. Create the dump
  2. Inspect the dump
  3. Restore it somewhere isolated
  4. Check the restored database
  5. Freeze writes at cutover
  6. Switch the app carefully
  7. Watch the new database under real traffic

That is the habit worth keeping.

A backup file is only the artifact. The restore drill is what gives confidence.