pg_squeeze table bloat reclaim PolarDB PostgreSQL-PolarDB(PolarDB)-阿里云帮助中心

Comparison with pg_repack

pg_squeeze and pg_repack both manage table bloat but differ in mechanism and operation. Choose pg_squeeze or pg_repack based on your workload.

Item	pg_squeeze	pg_repack
Mechanism	Logical decoding: Uses a replication slot to asynchronously capture data changes from WAL.	Trigger: Creates a trigger on the source table to synchronously capture DML operations.
Impact on write performance	Minimal. Asynchronous change capture has limited impact on write performance.	Significant. Each DML fires an additional write, creating bottlenecks under high concurrency.
Resource consumption	Uses one replication slot. Slow or failed tasks can cause WAL accumulation.	Triggers increase CPU overhead under high concurrency. Requires an extra log table to record changes.
Operational complexity	Monitor replication slot status and WAL storage to prevent disk exhaustion from task failures.	Manage triggers to ensure correct operation after schema changes or migrations.

Quick start

Install, configure, and enable automatic compaction for a table.

Step 1: Configure cluster parameters

Log on to the PolarDB console. On the Clusters page, find the target cluster and click its ID.
In the left-side navigation pane, choose Settings and Management > Parameters.
On the Parameters page, modify the following parameters:
- wal_level: Set to logical. Required for logical decoding; slightly increases WAL generation.
- max_replication_slots: pg_squeeze needs at least 1 + squeeze.workers_per_database replication slots per enabled database. Increase by at least 2 (e.g., from 10 to 12).
- shared_preload_libraries: Append ,pg_squeeze to the existing value. For example, if the original value is pg_stat_statements, change it to pg_stat_statements,pg_squeeze.
Restart the cluster for the changes to take effect.

Step 2: Install and start the extension

Enable pg_squeeze in the target database.

Connect to the target database.

Run the following SQL statements to create the extension and start the scheduler worker. This process periodically checks for and initiates compaction tasks based on the configuration in squeeze.tables.

-- Create the extension in the target database
CREATE EXTENSION pg_squeeze;

-- Start the background scheduler process
-- This operation only needs to be performed once for each database where pg_squeeze will be used
SELECT squeeze.start_worker();

(Optional) Verify the background process started. A row with application_name = squeeze scheduler confirms success.
```
SELECT * FROM pg_stat_activity WHERE application_name LIKE 'squeeze%';
```

Step 3: Configure tables to process

Register tables for pg_squeeze and define their compaction schedule.

Ensure the table has a replica identity: pg_squeeze uses this to match rows between old and new tables. A primary key serves as the default replica identity.
- Check: Run \d+ your_table_name and verify replica identity is DEFAULT or FULL.
- Set: If the table has no primary key but has a unique index (e.g., your_table_uidx), set it as the replica identity:
```
ALTER TABLE your_table_name REPLICA IDENTITY USING INDEX your_table_uidx;
```

Register the table in squeeze.tables: Insert table information and scheduling policy into squeeze.tables. The schedule field uses a cron-like format.

-- Example 1: Check the public.foo table every day at 2:30 AM
INSERT INTO squeeze.tables (tabschema, tabname, schedule)
VALUES ('public', 'foo', ROW(ARRAY[30], ARRAY[2], NULL, NULL, NULL)::squeeze.schedule);

-- Example 2: Check the public.bar table every Sunday at 4:00 AM and sort by its primary key after compaction
INSERT INTO squeeze.tables (tabschema, tabname, schedule, clustering_index)
VALUES ('public', 'bar', ROW(ARRAY[0], ARRAY[4], NULL, NULL, ARRAY[0])::squeeze.schedule, 'bar_pkey');

The structure of the schedule field is (minutes, hours, days of month, months, days of week). NULL represents any value. All available fields are listed in Appendix: squeeze.tables configuration table.

Step 4: Monitor and verify

Monitor compaction task status through pg_squeeze views.

-- View active compaction tasks in the current database
SELECT * FROM squeeze.get_active_workers();

-- View the 10 most recent successful compaction history records
SELECT * FROM squeeze.log ORDER BY finished DESC LIMIT 10;

-- View failed compaction records and their causes
SELECT * FROM squeeze.errors;

Step 5: (Optional) Disable scheduled compaction

To stop scheduled compaction for a table, delete its row from squeeze.tables.

DELETE FROM squeeze.tables WHERE tabschema = 'public' AND tabname = 'foo';

Manual compaction

Trigger one-time compaction for any table without registering it in squeeze.tables.

Call squeeze.squeeze_table() to start a manual compaction.

-- Initiate a default compaction task for the public.pgbench_accounts table
SELECT squeeze.squeeze_table('public', 'pgbench_accounts');

-- Initiate a compaction task and physically sort the table by its primary key (pgbench_accounts_pkey) upon completion
SELECT squeeze.squeeze_table('public', 'pgbench_accounts', 'pgbench_accounts_pkey');

Note

squeeze.squeeze_table is non-transactional — it starts a background worker and returns immediately. Rolling back the calling transaction does not stop an already-started compaction.

Advanced configuration

Tune concurrency and lock behavior

Set squeeze.max_xlock_time to limit how long the exclusive lock is held during the final compaction phase, preventing long blocks on business queries.

-- Set the maximum exclusive lock holding time to 100 milliseconds
SET squeeze.max_xlock_time TO 100;

If the lock exceeds this limit, pg_squeeze releases it, processes pending changes, and retries. Repeated timeouts cause the task to fail. Schedule compaction during off-peak hours, or increase this value if timeouts persist.

Configure concurrent workers

Set squeeze.workers_per_database greater than 1 to compact multiple tables in parallel within a single database.

-- Allow a maximum of 2 concurrent compaction worker processes per database
SET squeeze.workers_per_database = 2;

Note

This applies to all databases using pg_squeeze. Total workers (including the scheduler) cannot exceed max_worker_processes.

Appendix: `squeeze.tables` configuration table

Register tables and define pg_squeeze compaction policies in squeeze.tables.

Parameter	Type	Description	Default	Recommendations
`tabschema`	`name`	The schema name of the table.	-	-
`tabname`	`name`	The name of the table.	-	-
`schedule`	`squeeze.schedule`	The schedule for task checks, in the format `(minutes, hours, days of month, months, days of week)`.	-	Schedule during off-peak hours. Example for 2:30 AM daily: `ROW(ARRAY[30], ARRAY[2], NULL, NULL, NULL)::squeeze.schedule`.
`free_space_extra`	`integer`	The additional bloat percentage required to trigger compaction (beyond the space reserved by `fillfactor`).	50	For write-intensive OLTP tables, lower to 30 to reclaim space more frequently.
`min_size`	`integer`	The minimum table size in MB. Tables smaller than this will not be processed.	8	Keep default to skip small tables.
`vacuum_max_age`	`interval`	The maximum time since the last `VACUUM` for which the free space map (FSM) is considered valid.	'1 hour'	Keep default. After this interval, `pg_squeeze` performs deeper analysis before compacting.
`max_retry`	`integer`	The maximum number of retries allowed if the initial compaction fails.	0	Increase (e.g., to 2) if compaction frequently fails due to DDL changes.
`clustering_index`	`name`	Specifies an existing index. After compaction, the data in the table will be physically sorted according to the key of this index, similar to the `CLUSTER` command.	`NULL`	For range queries, set a primary key or core index as the clustering index to improve performance.
`rel_tablespace`	`name`	Specifies the target tablespace to which the table will be moved after compaction.	`NULL`	Used for tablespace migration scenarios. If `NULL`, the table remains in its original tablespace.
`ind_tablespaces`	`name[]`	A two-dimensional array used to specify tablespace mappings for indexes.	`NULL`	Used for index tablespace migration.
`skip_analyze`	`boolean`	Specifies whether to skip the `ANALYZE` command after compaction.	`false`	Keep `false`. Compaction changes data distribution, and `ANALYZE` updates optimizer statistics.