Tiered storage for hot and cold data
AnalyticDB for PostgreSQL supports tiered storage, which allows you to move infrequently accessed hot tables to Object Storage Service (OSS) as cold tables to reduce storage costs. This topic describes the limitations and procedures for using tiered storage.
In this topic, tables stored on local disks are referred to as hot tables, and tables stored in remote Object Storage Service (OSS) are referred to as cold tables.
Requirements
-
AnalyticDB for PostgreSQL V6.0 instances that run minor version 6.3.11.1 or later.
-
AnalyticDB for PostgreSQL V7.0 instances that run minor version 7.0.3.0 or later.
You can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.
Limitations
-
Serverless mode is not supported.
-
The limitations for tiered storage vary slightly between AnalyticDB for PostgreSQL V6.0 and AnalyticDB for PostgreSQL V7.0 instances. Review the limitations that apply to your instance version.
V6.0 instances
-
You can convert a non-partitioned table to a cold table.
-
You can convert an entire partitioned table to a cold table, or convert specific subpartitions to cold partitions.
-
You cannot convert subpartitions of an indexed partitioned table to cold partitions. You also cannot create an index on a partitioned table that contains cold partitions.
-
When you convert a hot table to a cold table, associated objects such as primary keys, indexes, sequences, rules, and comments are automatically deleted and cannot be recovered.
-
Data in a cold table or cold partition is read-only. Write, delete, update, and most Data Definition Language (DDL) operations, such as
ALTER COLUMNandDROP COLUMN, are not supported. However, you can use theDROP TABLEstatement to delete a cold table. -
AnalyticDB for PostgreSQL does not support directly converting a cold table or cold partition back to hot. As a workaround, you can use the
CREATE TABLE AS SELECTstatement to create a new hot table and migrate the data.
V7.0 instances
-
You can convert a non-partitioned table to a cold table.
-
You can convert an entire partitioned table to a cold table, or convert specific subpartitions to cold partitions.
-
You can read data from and write data to cold tables and cold partitions. However, delete and update operations are not supported. DDL operations on cold tables, such as
ALTER COLUMNandDROP COLUMN, are available by invitation only. To use these features, submit a ticket to contact technical support. -
You cannot convert subpartitions of a partitioned table that has a primary key or a unique index to cold partitions. This limitation does not apply to partitioned tables that have only regular indexes.
-
When you convert a hot table to a cold table, associated objects such as primary keys, indexes, sequences, rules, and comments are automatically deleted and cannot be recovered.
-
AnalyticDB for PostgreSQL does not support directly converting a cold table or cold partition back to hot. As a workaround, you can use the
CREATE TABLE AS SELECTstatement to create a new hot table and migrate the data.
-
Billing
After you convert a hot table to a cold table, the data is stored in OSS and incurs storage fees based on the following rules:
-
Cold storage is billed on a pay-as-you-go basis.
-
Cold storage usage is measured every 5 minutes and billed hourly.
-
The price is the same as that for OSS Standard storage. For more information, see OSS Pricing.
For example, in Chinese mainland regions, the price for OSS is CNY 0.12 per GB/month. The hourly price is CNY 0.000166666666667 per GB. The actual price on your bill prevails.
You can view the billing details for cold storage on the page.
Procedure
-
The conversion process creates a temporary table, writes data to it, and then uploads the data to OSS. These operations consume local and network I/O resources and may affect the performance of running queries. Evaluate the potential impact on your services before you proceed.
-
After a hot table is converted to a cold table, the local disk space that it occupied is released.
-
For AnalyticDB for PostgreSQL V6.0 instances, the conversion is scheduled to start at a specified time. The scheduling and queuing process may introduce delays. For AnalyticDB for PostgreSQL V7.0 instances, the conversion starts immediately after you execute the statement. The total time required for the conversion depends on the instance specifications, the number of concurrent conversions, and the data volume. For more information, see Performance data.
The procedures for using tiered storage differ between AnalyticDB for PostgreSQL V6.0 and AnalyticDB for PostgreSQL V7.0 instances. Select the tab that corresponds to your instance version.
V6.0 instances
Convert a non-partitioned table
Syntax
ALTER TABLE <tableName> SET ttl interval '<scheduling_interval>' move to storage_cold;
Example
Create a non-partitioned table named tiered_storage_heap and write data to it.
CREATE TABLE tiered_storage_heap (a int, b int);
INSERT INTO tiered_storage_heap SELECT random() * 1000,1 FROM generate_series(1,1000);
-
Example 1: Move the standard table
tiered_storage_heapto cold storage after three days (3days). For example, if you execute the ALTER TABLE operation at 09:00:00 on July 17, 2023, then three days later (at 09:00:00 on July 20, 2023), the entiretiered_storage_heaptable is moved to cold storage.ALTER TABLE tiered_storage_heap SET ttl interval '3days' move to storage_cold; -
Example 2: Schedule the non-partitioned table
tiered_storage_heapto be converted to a cold table at a specific time (16:53:58 on July 28, 2023).ALTER TABLE tiered_storage_heap SET ttl '2023-07-28 16:53:58'::Timestamp move to storage_cold; -
Example 3: Immediately convert a table by setting the conversion time to a time in the past.
-
Set the conversion time to three days ago (-3days). Run the following command to immediately convert the table.
ALTER TABLE tiered_storage_heap SET ttl interval '-3days' move to storage_cold; -
Specify a past date and time. If the current time is 16:53:58 on July 17, 2023, you can run the following command to immediately convert the table.
ALTER TABLE tiered_storage_heap SET ttl '2022-07-16 16:53:58'::Timestamp move to storage_cold;
-
Convert subpartitions of a partitioned table
Syntax
ALTER TABLE <child_partition_name> SET ttl interval '<scheduling_interval>' move to storage_cold;
You can run \d+ in psql to view the subpartition names of a specific partitioned table.
Example
Create a partitioned table named tiered_storage_partition_hdfs.
CREATE TABLE tiered_storage_partition_hdfs(a int,b int) distributed by (a) partition by range(a) (start(1) end(20) every(10));
This example creates two subpartitions: tiered_storage_partition_hdfs_1_prt_1 and tiered_storage_partition_hdfs_1_prt_2.
Write data to the subpartition tiered_storage_partition_hdfs_1_prt_1.
INSERT INTO tiered_storage_partition_hdfs_1_prt_1 values(1, 1), (2, 2), (3, 3), (4, 4);
Schedule the subpartition tiered_storage_partition_hdfs_1_prt_1 to be converted to a cold partition three days later. For example, a statement run at 09:00:00 on July 17, 2023, will trigger the conversion of the tiered_storage_partition_hdfs_1_prt_1 subpartition at 09:00:00 on July 20, 2023. Other subpartitions are not affected.
ALTER TABLE tiered_storage_partition_hdfs_1_prt_1 SET ttl interval '3days' move to storage_cold;
Query the storage status of a table
Use one of the following methods to query the storage status of a table. The query returns cold for a cold table and hot for a hot table.
-
Method 1:
SELECT pg_tiered_storage_relation_status('<table_name>'::regclass::oid::bigint); -
Method 2:
SELECT pg_tiered_storage_relation_status(<table_oid>::bigint);Find the table OID by running
SELECT oid FROM pg_class where relname='<table_name>';.
V7.0 instances
Convert a non-partitioned table
Syntax
SELECT pg_tiered_storage_move_table_to_storage_cold('<schema_name>', '<table_name>');
Example
Create a non-partitioned table named tiered_storage_heap_oss in the public schema and write data to it.
CREATE TABLE tiered_storage_heap_oss (a int, b int) DISTRIBUTED BY(a) ;
INSERT INTO tiered_storage_heap_oss SELECT random() * 1000,1 FROM generate_series(1,100);
-
Example 1: Immediately convert an entire table to a cold table.
Run the following statement to immediately convert the entire non-partitioned table to a cold table.
SELECT pg_tiered_storage_move_table_to_storage_cold('public', 'tiered_storage_heap_oss'); -
Example 2: Use pg_cron to schedule the conversion of an entire table.
Assume you are user
etl_userand want to convert the non-partitioned tabletiered_storage_heap_ossin theetldatabase to a cold table at 01:00 the next day. Connect to thepostgresdatabase and run the following statement.SELECT cron.schedule('etl_table_transfer_to_cold', '0 1 * * *', 'SELECT pg_tiered_storage_move_table_to_storage_cold(''public'', ''tiered_storage_heap_oss'');', 'etl', 'etl_user');After 01:00 the next day, confirm that the conversion was successful, and then run the following statement to delete the scheduled job.
SELECT cron.unschedule(<job_id>);NoteThe job ID is automatically generated when you create the job. You can find the ID in the
jobidcolumn of thecron.jobtable.
Convert subpartitions of a partitioned table
Syntax
SELECT pg_tiered_storage_move_table_to_storage_cold('<schema_name>', '<child_partition_name>');
You can run \d+ in psql to view the subpartition names of a specific partitioned table.
Example
-
Example 1: Immediately convert a subpartition to a cold partition.
Create a partitioned table named
tiered_storage_partition_ossin thepublicschema.CREATE TABLE tiered_storage_partition_oss(a int,b int) DISTRIBUTED BY (a) PARTITION BY range(a) (start(1) end(20) every(10));NoteThis example creates two subpartitions:
tiered_storage_partition_oss_1_prt_1andtiered_storage_partition_oss_1_prt_2.Write data to the subpartition
tiered_storage_partition_oss_1_prt_1.INSERT INTO tiered_storage_partition_oss_1_prt_1 VALUES(1, 1), (2, 2), (3, 3), (4, 4);Immediately convert the subpartition to a cold partition.
SELECT pg_tiered_storage_move_table_to_storage_cold('public', 'tiered_storage_partition_oss_1_prt_1'); -
Example 2: Use pg_cron to schedule the conversion of daily subpartitions.
Create a daily partitioned table named
daily_log_detailsin theetldatabase.CREATE TABLE daily_log_details (id INT, log_message text, created_date character varying(64)) PARTITION BY LIST (created_date) ( PARTITION p20230601 VALUES ('20230601'), PARTITION p20230602 VALUES ('20230602'), PARTITION p20230603 VALUES ('20230603'), PARTITION p20230604 VALUES ('20230604'), PARTITION p20230605 VALUES ('20230605'), PARTITION p20230606 VALUES ('20230606'), PARTITION p20230607 VALUES ('20230607'), PARTITION p20230608 VALUES ('20230608'), PARTITION p20230609 VALUES ('20230609'), PARTITION p20230610 VALUES ('20230610'), PARTITION p20230611 VALUES ('20230611'), DEFAULT PARTITION others );Set up a job that runs at 03:00 as user
etl_userto convert subpartitions older than 10 days to cold partitions. Follow these steps:-
Create a cleanup function in the
etldatabase.CREATE OR REPLACE FUNCTION pg_tiered_storage_move_partition_daily_table_to_cold_storage(schemaname text, tablename text) RETURNS void AS $$ DECLARE fetch_overdue_partition_sql text; cold_storage_sql text; target record; BEGIN fetch_overdue_partition_sql := 'WITH targetpartitions AS (SELECT * FROM pg_partitions WHERE tablename = $1 AND schemaname = $2 AND partitionlevel = 1 AND partitionisdefault = FALSE) SELECT partitiontablename FROM targetpartitions WHERE to_date(substring(targetpartitions.partitionname FROM 2), ''YYYYMMDD'') <= current_date - INTERVAL ''10 days'''; -- fetch overdue partitions FOR target IN EXECUTE fetch_overdue_partition_sql USING tablename, schemaname LOOP cold_storage_sql := 'SELECT pg_tiered_storage_move_table_to_storage_cold($1::text, $2::text)'; raise notice 'sql %', cold_storage_sql; EXECUTE cold_storage_sql USING schemaname, target.partitiontablename; END LOOP; END; $$ LANGUAGE plpgsql; -
Connect to the
postgresdatabase and run the conversion statement.SELECT cron.schedule('etl_daily_transfer_to_cold', '0 3 * * *', 'SELECT pg_tiered_storage_move_partition_daily_table_to_cold_storage(''public'', ''daily_log_details'');', 'etl', 'etl_user');
-
-
Example 3: Use pg_cron to schedule the conversion of monthly subpartitions.
Create a monthly partitioned table named
month_log_detailsin theetldatabase.CREATE TABLE month_log_details (id INT, log_message text, created_date character varying(64)) PARTITION BY LIST (created_date) ( PARTITION p202306 VALUES ('202306'), PARTITION p202307 VALUES ('202307'), PARTITION p202308 VALUES ('202308'), PARTITION p202309 VALUES ('202309'), PARTITION p202310 VALUES ('202310'), DEFAULT PARTITION others );Set up a job that runs at 05:00 as user
etl_userto convert subpartitions older than 3 months to cold partitions. Follow these steps:-
Create a cleanup function in the
etldatabase.CREATE OR REPLACE FUNCTION pg_tiered_storage_move_partition_table_to_cold_storage(schemaname text, tablename text) RETURNS void AS $$ DECLARE fetch_overdue_partition_sql text; cold_storage_sql text; target record; BEGIN fetch_overdue_partition_sql := 'WITH targetpartitions AS (SELECT * FROM pg_partitions WHERE tablename = $1 AND schemaname = $2 AND partitionlevel = 1 AND partitionisdefault = FALSE) SELECT partitiontablename FROM targetpartitions WHERE to_date(substring(targetpartitions.partitionname FROM 2), ''YYYYMM'') <= current_date - INTERVAL ''3 months'''; -- fetch overdue partitions FOR target IN EXECUTE fetch_overdue_partition_sql USING tablename, schemaname LOOP cold_storage_sql := 'SELECT pg_tiered_storage_move_table_to_storage_cold($1::text, $2::text)'; raise notice 'sql %', cold_storage_sql; EXECUTE cold_storage_sql USING schemaname, target.partitiontablename; END LOOP; END; $$ LANGUAGE plpgsql; -
Connect to the
postgresdatabase and run the conversion statement.SELECT cron.schedule('etl_month_transfer_to_cold', '0 5 1 * *', 'SELECT pg_tiered_storage_move_partition_table_to_cold_storage(''public'', ''month_log_details'');', 'etl', 'etl_user');
-
Query the storage status of a table
Run the following statement to query the storage status of a table. The query returns cold for a cold table and hot for a hot table.
SELECT pg_tiered_storage_table_status('<schema_name>', '<table_name>|<child_partition_name>')
Hot and cold storage usage
Log on to the AnalyticDB for PostgreSQL console. On the Basic Information page, check the Instance Status card to view Hot Storage and Cold Storage usage.
Backup and restoration
AnalyticDB for PostgreSQL tiered storage supports backup and restoration. When restoring data, the following rules apply:
V6.0 instances
If a full backup is available, you can restore data to any point in time. The storage status (hot or cold) of the restored data matches its status at the time of backup.
V7.0 instances
If a full backup is available, you can restore data to a specified point in time. The storage status (hot or cold) of the data at the time of restoration matches its status at the time of backup. AnalyticDB for PostgreSQL V7.0 has the following limitations:
-
If no data is written to a table after it is converted to a cold table, you can restore the table to any point in time.
-
If data is written to a table after it is converted to a cold table, you can restore the table to any backup point before the conversion. For restores to a point in time after the conversion, you can only recover the table to its state as of the most recent write operation.
To support backup and restoration, the system does not immediately release the OSS space for a dropped table. Instead, the space is retained for a period equal to the Data Backup Retention Period (Days) setting. You are charged for the OSS space during this extended retention period.
For more information about backup and restoration, see Backup and restoration.
Scaling
When you scale in an AnalyticDB for PostgreSQL V6.0 instance, cold tables are moved to local temporary tables for data redistribution. After the scale-in is complete, the data is re-uploaded to OSS, and the local temporary tables are deleted. For a scale-in to succeed, the remaining disk space across all nodes must exceed the total size of the cold tables. During the scale-in process, data is downloaded from OSS. The operation time is limited by the OSS download bandwidth. Evaluate the scale-in duration before you proceed.
For AnalyticDB for PostgreSQL V7.0 instances, scaling does not affect data in cold storage. Data redistribution and restoration to local storage are not required. Therefore, you do not need to account for the disk space used by cold storage.
Performance data
The performance tests were run on a 4-node instance and an 8-node instance, both with 2-core 8 GB nodes. The test table was created and populated by using the following statements.
CREATE TABLE t333 (a int, b int);
INSERT INTO t333 SELECT random() * 1000000, random()*1000000 FROM generate_series(1,3000000000);
For AnalyticDB for PostgreSQL V6.0 instances, run the ALTER TABLE t333 SET ttl interval '-3days' move to storage_cold; statement.
For AnalyticDB for PostgreSQL V7.0 instances, run the SELECT pg_tiered_storage_move_table_to_storage_cold('public', 't333'); statement.
The following table shows the test results for single-table conversion time.
|
Hot table size (GB) |
4-node conversion time (s) |
8-node conversion time (s) |
||
|
V6.0 instance |
V7.0 instance |
V6.0 instance |
V7.0 instance |
|
|
1 |
10 |
5 |
5 |
2.8 |
|
10 |
96 |
48 |
42 |
25.2 |
|
100 |
848 |
490 |
333 |
243 |