Frequent UPDATE and DELETE operations cause table bloat, wasting storage and degrading query performance. The pg_squeeze extension reclaims bloated space through online table reorganization using logical decoding, with minimal write performance impact and no service interruption.
Comparison with pg_repack
pg_squeeze and pg_repack both manage table bloat but differ in mechanism and operation. Choose pg_squeeze or pg_repack based on your workload.
|
Item |
pg_squeeze |
pg_repack |
|
Mechanism |
Logical decoding: Uses a replication slot to asynchronously capture data changes from WAL. |
Trigger: Creates a trigger on the source table to synchronously capture DML operations. |
|
Impact on write performance |
Minimal. Asynchronous change capture has limited impact on write performance. |
Significant. Each DML fires an additional write, creating bottlenecks under high concurrency. |
|
Resource consumption |
Uses one replication slot. Slow or failed tasks can cause WAL accumulation. |
Triggers increase CPU overhead under high concurrency. Requires an extra log table to record changes. |
|
Operational complexity |
Monitor replication slot status and WAL storage to prevent disk exhaustion from task failures. |
Manage triggers to ensure correct operation after schema changes or migrations. |
Quick start
Install, configure, and enable automatic compaction for a table.
Step 1: Configure cluster parameters
-
Log on to the PolarDB console. On the Clusters page, find the target cluster and click its ID.
-
In the left-side navigation pane, choose .
-
On the Parameters page, modify the following parameters:
-
wal_level: Set tological. Required for logical decoding; slightly increases WAL generation. -
max_replication_slots:pg_squeezeneeds at least1 + squeeze.workers_per_databasereplication slots per enabled database. Increase by at least 2 (e.g., from 10 to 12). -
shared_preload_libraries: Append,pg_squeezeto the existing value. For example, if the original value ispg_stat_statements, change it topg_stat_statements,pg_squeeze.
-
-
Restart the cluster for the changes to take effect.
Step 2: Install and start the extension
Enable pg_squeeze in the target database.
-
Connect to the target database.
-
Run the following SQL statements to create the extension and start the scheduler worker. This process periodically checks for and initiates compaction tasks based on the configuration in
squeeze.tables.-- Create the extension in the target database CREATE EXTENSION pg_squeeze; -- Start the background scheduler process -- This operation only needs to be performed once for each database where pg_squeeze will be used SELECT squeeze.start_worker(); -
(Optional) Verify the background process started. A row with
application_name=squeeze schedulerconfirms success.SELECT * FROM pg_stat_activity WHERE application_name LIKE 'squeeze%';
Step 3: Configure tables to process
Register tables for pg_squeeze and define their compaction schedule.
-
Ensure the table has a
replica identity:pg_squeezeuses this to match rows between old and new tables. A primary key serves as the defaultreplica identity.-
Check: Run
\d+ your_table_nameand verifyreplica identityisDEFAULTorFULL. -
Set: If the table has no primary key but has a unique index (e.g.,
your_table_uidx), set it as thereplica identity:ALTER TABLE your_table_name REPLICA IDENTITY USING INDEX your_table_uidx;
-
-
Register the table in
squeeze.tables: Insert table information and scheduling policy intosqueeze.tables. Theschedulefield uses a cron-like format.-- Example 1: Check the public.foo table every day at 2:30 AM INSERT INTO squeeze.tables (tabschema, tabname, schedule) VALUES ('public', 'foo', ROW(ARRAY[30], ARRAY[2], NULL, NULL, NULL)::squeeze.schedule); -- Example 2: Check the public.bar table every Sunday at 4:00 AM and sort by its primary key after compaction INSERT INTO squeeze.tables (tabschema, tabname, schedule, clustering_index) VALUES ('public', 'bar', ROW(ARRAY[0], ARRAY[4], NULL, NULL, ARRAY[0])::squeeze.schedule, 'bar_pkey');The structure of the
schedulefield is(minutes, hours, days of month, months, days of week).NULLrepresents any value. All available fields are listed in Appendix: squeeze.tables configuration table.
Step 4: Monitor and verify
Monitor compaction task status through pg_squeeze views.
-- View active compaction tasks in the current database
SELECT * FROM squeeze.get_active_workers();
-- View the 10 most recent successful compaction history records
SELECT * FROM squeeze.log ORDER BY finished DESC LIMIT 10;
-- View failed compaction records and their causes
SELECT * FROM squeeze.errors;
Step 5: (Optional) Disable scheduled compaction
To stop scheduled compaction for a table, delete its row from squeeze.tables.
DELETE FROM squeeze.tables WHERE tabschema = 'public' AND tabname = 'foo';
Manual compaction
Trigger one-time compaction for any table without registering it in squeeze.tables.
Call squeeze.squeeze_table() to start a manual compaction.
-- Initiate a default compaction task for the public.pgbench_accounts table
SELECT squeeze.squeeze_table('public', 'pgbench_accounts');
-- Initiate a compaction task and physically sort the table by its primary key (pgbench_accounts_pkey) upon completion
SELECT squeeze.squeeze_table('public', 'pgbench_accounts', 'pgbench_accounts_pkey');
squeeze.squeeze_table is non-transactional — it starts a background worker and returns immediately. Rolling back the calling transaction does not stop an already-started compaction.
Advanced configuration
Tune concurrency and lock behavior
Set squeeze.max_xlock_time to limit how long the exclusive lock is held during the final compaction phase, preventing long blocks on business queries.
-- Set the maximum exclusive lock holding time to 100 milliseconds
SET squeeze.max_xlock_time TO 100;
If the lock exceeds this limit, pg_squeeze releases it, processes pending changes, and retries. Repeated timeouts cause the task to fail. Schedule compaction during off-peak hours, or increase this value if timeouts persist.
Configure concurrent workers
Set squeeze.workers_per_database greater than 1 to compact multiple tables in parallel within a single database.
-- Allow a maximum of 2 concurrent compaction worker processes per database
SET squeeze.workers_per_database = 2;
This applies to all databases using pg_squeeze. Total workers (including the scheduler) cannot exceed max_worker_processes.
Appendix: squeeze.tables configuration table
Register tables and define pg_squeeze compaction policies in squeeze.tables.
|
Parameter |
Type |
Description |
Default |
Recommendations |
|
|
|
The schema name of the table. |
- |
- |
|
|
|
The name of the table. |
- |
- |
|
|
|
The schedule for task checks, in the format |
- |
Schedule during off-peak hours. Example for 2:30 AM daily: |
|
|
|
The additional bloat percentage required to trigger compaction (beyond the space reserved by |
50 |
For write-intensive OLTP tables, lower to 30 to reclaim space more frequently. |
|
|
|
The minimum table size in MB. Tables smaller than this will not be processed. |
8 |
Keep default to skip small tables. |
|
|
|
The maximum time since the last |
'1 hour' |
Keep default. After this interval, |
|
|
|
The maximum number of retries allowed if the initial compaction fails. |
0 |
Increase (e.g., to 2) if compaction frequently fails due to DDL changes. |
|
|
|
Specifies an existing index. After compaction, the data in the table will be physically sorted according to the key of this index, similar to the |
|
For range queries, set a primary key or core index as the clustering index to improve performance. |
|
|
|
Specifies the target tablespace to which the table will be moved after compaction. |
|
Used for tablespace migration scenarios. If |
|
|
|
A two-dimensional array used to specify tablespace mappings for indexes. |
|
Used for index tablespace migration. |
|
|
|
Specifies whether to skip the |
|
Keep |