REBUILD

更新时间:
复制 MD 格式

Hologres supports modifying some table structures, table properties, and column properties through ALTER TABLE syntax. However, for properties that affect table storage, ALTER TABLE syntax is not supported. Starting from Hologres V3.1, the REBUILD syntax is supported. With the REBUILD syntax, you can flexibly modify various parameters of a table. This topic describes how to use REBUILD in Hologres.

Syntax

Statement format

ASYNC REBUILD TABLE [ IF EXISTS ] <table_name>
    [ WITH ( <rebuild_parameter> [= <value>] [, ... ] )]  
    <action> [, ... ];
WHERE action IS ONE OF:
    ADD [ COLUMN ] <column_name> <data_type> [ column_constraint [ ... ] ]
    ALTER [ COLUMN ] <column_name> [ SET DATA ] TYPE <data_type> [ USING <expression> ]
    ALTER [ COLUMN ] <column_name> SET DEFAULT <expression>
    ALTER [ COLUMN ] <column_name> DROP DEFAULT
    ALTER [ COLUMN ] <column_name> { SET | DROP } NOT NULL
    ALTER PRIMARY KEY (<column_name> [, ...])
    TO [LOGICAL] PARTITION [BY LIST(<column_name> [, <column_name>])]
    SET ( <parameter> [= <value>] [, ... ] )
WHERE rebuild_parameter IS ONE OF:
    keep_source
    binlog_mode
    rebuild_guc_<guc_name> = '<guc_value>'

Parameters

Parameter

Sub-item

Description

ASYNC

This statement specifies that the REBUILD task is executed asynchronously. After the task is executed, a query_id is returned. You can use the query_id to monitor the execution status of the task. Synchronous execution is not currently supported.

table_name

The name of the target table to be rebuilt.

column_name

The column name of the target table.

data_type

The data type of the column.

action

ADD COLUMN

Adds a column. You can add NOT NULL columns and set default values.

ALTER COLUMN TYPE

Modifies the data type of a column.

ALTER COLUMN SET/DROP DEFAULT

Sets or removes the default value for a column. NULL values in existing data will not change.

ALTER COLUMN SET/DROP NOT NULL

Sets or removes the NOT NULL constraint for a column.

ALTER PRIMARY KEY

Modifies the primary key of the table. If there is a data conflict with the new primary key, an error will be reported during asynchronous execution. Monitor the execution status of the task promptly.

TO [LOGICAL] PARTITION

Converts the table to a logical/physical partitioned table, supporting the following scenarios:

  • Convert a standard table to a physical partitioned table. You must specify a partition key.

  • Convert a standard table to a logical partitioned table. You must specify a partition key.

  • Modify the partition key of a physical partitioned table.

  • Convert a physical partitioned table to a logical partitioned table. You can determine whether to modify the partition key.

  • You cannot convert a logical partitioned table to a physical partitioned table.

SET ( <parameter> [= <value>])

Modifies table properties. Common scenarios:

  • Modify the distribution key (distribution_key).

  • Modify the segment key (event_time_column).

  • Modify the clustering index (clustering_key).

  • Modify the table storage format (orientation). You can convert between row-oriented storage, column-oriented storage, and row-column hybrid storage.

  • Modify the table_group to which the table belongs.

  • All other table properties are supported for modification.

  • To modify bitmap indexes (bitmap_columns) or dictionary encoding columns (dictionary_encoding_columns), use the ALTER TABLE statement. REBUILD is not required.

WITH (<rebuild_parameter> [= <value>])

Sets REBUILD task-related parameters. Common parameters:

  • keep_source: No value is required. If specified, the original table is not deleted. Instead, it is renamed to tmp_rebuild_old_<query_id>_<unique_id>_<table_name>.

  • binlog_mode: No value is required. Allows you to rebuild a table that has Binlog enabled. To prevent Binlog data loss, you must follow the steps in the Examples section.

  • rebuild_guc_hg_computing_resource='serverless': Uses Serverless resources to execute the REBUILD task. This avoids consuming your instance resources and improves task stability.

  • rebuild_guc_<guc_name>='<guc_value>': Other GUC parameters can be set here. For more information, see GUC parameters.

Precautions

  • Only asynchronous execution (ASYNC) is supported, which does not require long-term connection occupation.

  • After you submit a REBUILD task, the statement returns a query_id. You can use this query_id to view the REBUILD task execution status. If the submission does not return a query_id promptly, many asynchronous tasks might be running on the instance. We recommend that you wait for the query_id to be returned before you check the task status.

  • Using the REBUILD feature to modify table parameters involves underlying data redistribution, which consumes computing resources. Therefore, it is recommended that you execute REBUILD tasks during off-peak business hours, or use serverless computing resources for this operation to ensure business stability.

  • During a REBUILD task, the table is temporarily set to a read-only state, and writes are blocked. Starting from Hologres V4.1, REBUILD uses Dynamic Table technology for incremental updates, which can significantly reduce the read-only duration. This optimization applies only if the table meets the following conditions. Otherwise, the table remains read-only for the entire duration of the task.

    • The target table must have a primary key before the rebuild, and the new primary key must contain all columns of the original primary key.

    • The target table must use column-oriented storage or row-column hybrid storage before the rebuild.

    • The target table must not contain generated columns after the rebuild.

    • If the target table is a physical partitioned table before the rebuild, its partition key must remain unchanged after the rebuild.

    • If the target table becomes a logical partitioned table after the rebuild, it can have only one partition key.

  • To reduce overhead, combine multiple changes into a single REBUILD task.

  • After a physical partitioned table is rebuilt, its dynamic partition management properties are not inherited. You must reconfigure them after the task completes. These properties include:

    • The auto_partitioning property of the parent partitioned table.

    • Properties of child partitions, such as keep_alive.

    • After a physical partitioned table is rebuilt, any properties that were set independently on its child partitions are not inherited. The child partitions will inherit all properties from the parent table, such as bitmap_columns and dictionary_encoding_columns.

  • REBUILD is not supported for the following types of tables:

    • Tables with specific column properties (such as columnar storage optimization for JSONB columns) or column constraints (such as vector columns).

    • Tables with a full-text index or a global secondary index.

    • Tables that contain columns of the Serial or Bigserial data type.

    • Tables that have Dynamic Table or materialized view dependencies. Tables that have standard view dependencies are supported.

Examples

Rebuild on table without Binlog

-- Create a table and import data.
CREATE TABLE rebuild_test (
    a TEXT,
    b TEXT,
    ds TEXT
);
INSERT INTO rebuild_test VALUES ('1', '1', '2025-04-01'), ('2', '2', '2025-04-02'), ('3', '3', '2025-04-03');
-- Add a NOT NULL column with a default value.
ASYNC REBUILD TABLE rebuild_test ADD COLUMN c text NOT NULL DEFAULT 'a';
-- Change the primary key to column a.
ASYNC REBUILD TABLE rebuild_test ALTER PRIMARY KEY (a);
-- Use Serverless resources to run the REBUILD task, set the distribution_key and clustering_key, and change to row-column hybrid storage.
ASYNC REBUILD TABLE rebuild_test 
WITH (
    rebuild_guc_hg_computing_resource = 'serverless'
)
SET (
    distribution_key = 'a',
    clustering_key = 'a',
    orientation = 'row,column'
);
-- Convert the non-partitioned table to a logical partitioned table, set the partition key to ds, and add a NOT NULL constraint to the ds column.
ASYNC REBUILD TABLE rebuild_test 
    ALTER COLUMN ds SET NOT NULL,
    TO LOGICAL PARTITION BY LIST(ds);

Rebuild on table with Binlog

The REBUILD statement does not preserve historical Binlog data. Therefore, by default, you cannot run REBUILD on a table with Binlog enabled. You must specify the binlog_mode parameter and follow the steps below to ensure that the downstream consumer has fully processed all existing Binlog data.

  1. Run the REBUILD statement.

    ASYNC REBUILD TABLE rebuild_test 
    WITH (
      binlog_mode
    )
    <YOUR_ACTION>;
  2. For a REBUILD task with the binlog_mode parameter, the task automatically pauses after the set_readonly step is complete. You can query its progress by using the following SQL statement. At this point, the table is read-only, so no more data can be written and no new Binlog data is generated.

    postgres=# SELECT step, status, progress FROM hologres.rebuild_progress('<query_id>');
                 step              | status | progress 
    -------------------------------+--------+----------
     prepare                       | done   | 1/1
     create_tmp_table              | done   | 1/1
     get_src_table_snapshot        | done   | 1/1
     insert                        | done   | 1/1
     set_readonly                  | done   | 1/1
     check_snapshot                |        | 0/1
     re-insert                     |        | -
     check_additional_child_table  |        | -
     create_additional_child_table |        | -
     insert_additional_child_table |        | -
     swap                          |        | 0/1
    (11 rows)
  3. Wait for the downstream client to finish consuming the existing Binlog data. Then, manually run the following SQL statement to resume the REBUILD task. While the task resumes, the original table remains read-only, and no new Binlog data is generated.

    RESUME '<query_id>';

    After the REBUILD task is complete, Binlog is automatically enabled for the new table. You can then restart the downstream Binlog consumption task, starting from lsn = 0.

Monitoring and O&M

View the execution status of a REBUILD task

REBUILD tasks run asynchronously. When a task is submitted successfully, the statement returns a success status and a query_id. Query the hologres.rebuild_progress system view to monitor the status of the asynchronous subtasks. The REBUILD operation is complete only when all subtasks succeed. Run the following command:

SELECT * FROM hologres.rebuild_progress('<rebuild_query_id>');

The following table describes the columns of the system table.

Column name

Description

job_name

The query_id of the REBUILD task.

step_id

The step ID. The subtasks of REBUILD are executed sequentially according to step IDs.

step

The step name:

  • prepare: Make task preparations.

  • create_tmp_table: Create a temporary table.

  • get_src_table_snapshot: Get a data snapshot of the original table.

  • insert: Import historical data into the temporary table.

  • set_readonly: Set the table to be rebuilt to read-only. This stops data writes.

  • check_snapshot: Checks if the current data snapshot matches the one from the get_src_table_snapshot step. If they differ, the re-insert step is performed.

  • re-insert: Import incremental data.

  • check_additional_child_table: Check if any new child tables have been created by users from the start of REBUILD until now. This step is only for physical partitioned tables.

  • create_additional_child_table: Create potential new child tables for the temporary table. This step is only for physical partitioned tables.

  • insert_additional_child_table: Import historical data into the new temporary child tables. This step is only for physical partitioned tables.

  • swap: Replace the original table with the temporary table.

status

The status of the subtask.

  • done: Completed.

  • doing: In progress.

  • NULL: No need to execute this step, or currently unable to determine if this step needs to be executed.

  • error: Execution failed. Check the error message in the message field.

progress

The progress of the subtask, shown as m/n, where m is the number of completed sub-steps and n is the total. The total count (n) often corresponds to the number of table partitions.

start_time

The start time of the subtask.

end_time

The end time of the subtask.

queryid

The query_id of the subtask.

pid

The service process ID.

message

The message of the subtask. If the subtask reports an error, the error message is recorded in this field.

Stop and restart REBUILD tasks

  • Suspend a REBUILD asynchronous task.

    SUSPEND '<query_id>';
  • Resume the asynchronous task marked CANCEL.

    RESUME '<query_id>';

Rebuild task exceptions

If a REBUILD task is interrupted by an error, or if you manually suspend it, you can either resume it with the RESUME command or terminate the task and clean up by following these steps:

  • Run the following command to clean up the temporary tables that were created during the REBUILD process.

    CALL hg_clean_rebuild_tmp_tables('<query_id>');
  • If the table was made read-only, run the appropriate command to restore write access.

    • For Hologres versions earlier than V4.1, run the following command.

      ALTER TABLE <table_name> SET (readonly = false);
    • For Hologres V4.1 and later, run the following command.

      ALTER TABLE <table_name> SET RESET (ddl_options,write_options);