Reclaiming space from a big table in Postgres
Published on , under Programming, tagged with database and postgres.
Shockingly to many, Postgres treats deletes and updates very similarly. Due to
MVCC, tuples are marked and considered dead/obsolete when they are not reachable
by any active transaction, yet they remain physically on disk taking space in
the table. This is called bloat. It is the job of the autovacuum process, which
needs to run periodically, to re-use that space. But it does not reclaim it back
to the system. This might be specially noticeable after a bulk update where a
big portion of a table's rows is affected (by an UPDATE
or a DELETE
), disk
space grows, and never shrinks back again.
The same applies when some columns are deleted from a table, autovacuum won't
help us here either. The DROP COLUMN
does not physically remove the column,
but simply makes it invisible to SQL queries, in order to make this
operation super fast. New UPDATE
or INSERT
queries store a NULL
value on
that column, but disk space will not be reclaimed or re-used until, over time,
existing rows are updated.
To deal with a huge amount of bloat the solution is to run the VACUUM FULL
command which takes an exclusive lock on the table and rewrites it completely.
If you cannot afford some downtime and this table is critical, then you can use pg_repack, a Postgres extension that will copy the content of the table by creating a shadow clone of it (which is also hot updated by triggers) and then performing a table rename swap, which happens very fast.
Usually these solutions should only be considered when a significant amount of space needs to be reclaimed from within the table, otherwise, autovacuum should be good enough.