Hologres versions 4.0 and later support the image acceleration feature for MaxCompute tables. This feature synchronizes data from an external MaxCompute table to Hologres in near real-time or at user-defined intervals. When the metadata or data in the external data source changes, the mirrored data in Hologres is automatically updated. This feature supports mirroring entire tables and specific partitions of partitioned tables. You can use parameters to constrain the scope of the data to be mirrored. This topic details how to use the mirroring capability.
Features
Image acceleration for MaxCompute tables is enabled with the ALTER EXTERNAL TABLE command. This feature supports:
-
You can enable the mirroring capability for standard MaxCompute tables, excluding Append2.0, ACID1.0, and Delta tables.
-
You can mirror entire tables (partitioned or non-partitioned), as well as specific partitions of partitioned tables.
-
When enabling data mirroring, set
set hg_experimental_maxcompute_sdk_split_file_num_limit = 1to ensure stable data splitting. -
It supports schema evolution for external tables. Supported operations include adding, deleting, reordering, and renaming columns. For unsupported operations, queries automatically fall back to reading the external table directly. If schema evolution occurs, the mirroring process restarts automatically.
-
It automatically discovers and synchronizes partitions in MaxCompute tables.
-
You can set common indexes on mirrored data from data lake tables. This allows SQL queries to quickly locate data, which reduces I/O consumption and achieves faster query performance with fewer computing resources. For more information, see CREATE TABLE. The following indexes are supported:
-
data_mirroring_clustering_key
-
data_mirroring_bitmap_columns
-
data_mirroring_dictionary_encoding_columns
-
-
Supports querying mirrored data using primary/replica instances, compute group instances, and Serverless resources.
-
You can view the data mirroring progress.
-
You can view the storage usage of mirrored data at the instance, database, and table levels.
-
You can modify the data mirroring configuration for a specific table.
-
You can disable the data mirroring feature for a specific table.
Prerequisites
-
You have purchased a Hologres instance. For more information, see Purchase a Hologres instance.
-
You have activated MaxCompute and created a project. For more information, see Activate MaxCompute.
-
You have granted a user the permissions to access the MaxCompute project and tables. For more information, see Manage user permissions by using commands.
Usage notes
-
Image acceleration is supported only for external tables whose metadata is mapped using an external database. It is not supported for external tables created using
IMPORT FOREIGN SCHEMAorCREATE FOREIGN TABLE. -
The mirroring capability cannot be enabled for Append2.0, ACID1.0, or Delta tables in MaxCompute.
-
The MaxCompute table must not contain a data type that is unsupported by Hologres. Otherwise, the mirroring process fails. For a list of supported data types, see Data types.
Syntax
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on|off',
[data_mirroring_partition_num='1~N|all',]
[data_mirroring_partition_list='pt1,pt2...',]
-- Specify the resources used for the mirroring process.
[data_mirroring_guc_hg_computing_resource='[serverless | local]',]
[data_mirroring_guc_hg_experimental_serverless_computing_required_cores='<num>',]
-- Set indexes.
[data_mirroring_clustering_key='[columnName{:asc]} [,...]]',]
[data_mirroring_dictionary_encoding_columns='[columnName [,...]]',]
[data_mirroring_bitmap_columns='[columnName [,...]]',]
);
Parameters
|
Parameter |
Description |
Required |
Value |
|
data_mirroring_speed_up_enable |
Enables or disables the mirroring capability for an external table. |
Yes |
|
|
data_mirroring_partition_num |
Specifies the number of recent partitions to mirror. This parameter applies only to partitioned tables. |
No |
|
|
data_mirroring_partition_list |
Specifies a list of partitions to mirror. This parameter applies only to partitioned tables. |
No |
The name of the partition. |
|
data_mirroring_guc_hg_computing_resource |
Specifies the resource to use for the mirroring process. |
No |
|
|
data_mirroring_guc_hg_experimental_serverless_computing_required_cores |
Specifies the number of cores required for the Serverless resource. |
No |
Data type: positive integer. This parameter is required only when the data_mirroring_guc_hg_computing_resource parameter is set to Serverless. |
Procedure
Create an external database
Create an external database to map the metadata of a MaxCompute table to Hologres. For more information, see CREATE EXTERNAL DATABASE.
Use the following SQL statement:
-- Globally enable the bigmeta link.
alter role all set hg_experimental_enable_big_meta_on_maxcompute_sdk = on;
-- Globally enable direct reads for CommonTable. This is required only for instances upgraded to V4.0. For new instances, this is enabled by default.
alter role all set hg_experimental_external_catalog_routing = 'odps:common_table';
-- Generate stable data splits.
alter role all set hg_experimental_maxcompute_sdk_split_file_num_limit = 1;
-- Create an external database.
CREATE EXTERNAL DATABASE ext_db WITH
metastore_type 'maxcompute'
mc_project '<mc_project_name>'
;
Enable mirroring for an external table
Non-partitioned table
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
-- Specify the resources used for the mirroring process.
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes.
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);
Partitioned table
-
Specify the number of partitions
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='7',
-- Specify the resources used for the mirroring process.
data_mirroring_guc_hg_computing_resource='serverless',
data_mirroring_guc_hg_experimental_serverless_computing_required_cores='16',
-- Set indexes.
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);
-
Specify all partitions
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='all',
-- Specify the resources used for the mirroring process.
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes.
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);
-
Specify a list of partitions
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_list='pt1,pt2...',
-- Specify the resources used for the mirroring process.
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes.
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);
-
Specify the number of partitions and a list of specific partitions to retain historical partition mirrors
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='7',
data_mirroring_partition_list='pt1',
-- Specify the resources used for the mirroring process.
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes.
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);
Check the mirroring status
Use the following SQL statement:
select * from hologres.hg_datalake_get_mirror_status();
The result includes the external_db_name, external_schema_name, external_table_name, partition, mirrored_data_size, mirror_start_time, mirror_last_update_time, total_file_count, and mirrored_file_count fields. These fields provide details on the external database, schema, table, partition, data size (in bytes), start time, last update time, and total and mirrored file counts.
Parameters
|
Parameter |
Description |
|
external_db_name |
The external database that contains the table for the mirroring task. |
|
external_schema_name |
The external schema that contains the table for the mirroring task. |
|
external_table_name |
The external table associated with the mirroring task. |
|
partition |
The partition in the external table that corresponds to the mirroring task. |
|
mirror_data_size |
The size of the mirrored files for the table or partition. |
|
mirror_start_time |
The time when the mirroring task started. |
|
mirror_last_update_time |
The time when the mirrored data was last updated. |
|
total_file_count |
The total number of files. |
|
mirrored_file_count |
The number of mirrored files. |
Modify the mirroring configuration
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable external table mirroring.
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='12'
);
Check if a query uses mirrored files
Use explain analyze to view the execution plan. In the results, check the mirror hit status from the Meta mirror table count and Data mirror file count.
Use the following SQL statement:
explain analyze select xx from <ext_db.ext_schema.ext_table>;
The result is as follows:
QUERY PLAN
Gather (cost=0.00..5.80 rows=1000 width=116)
[128:1 id=100002 dop=1 time=26/26/26ms rows=107995(107995/107995/107995) mem=0/0/0B open=0/0/0ms get_next=26/26/26ms * ]
-> Local Gather (cost=0.00..5.01 rows=1000 width=116)
[id=2 dop=128 time=13/0/0ms rows=107995(97688/843/0) mem=0/0/0B open=0/0/0ms get_next=13/0/0ms local_dop=0/0/0]
-> Seq Scan on sls_ads_table (cost=0.00..5.01 rows=1000 width=116)
Foreign Table Type: DLF
[id=1 split_count=128 time=13/0/0ms rows=107995(97688/843/0) mem=37/5/4KB open=0/0/0ms get_next=13/0/0ms physical_reads=154(140/77/14) scan_rows=107995(97688/8999/671)]
ADVICE:
"[node id : 1] Table default."sls_ads_table" Miss Stats! please run 'analyze default."sls_ads_table"'. "
Query id:[1002007457093288810]
QE version: 2.0
Query Queue: init_warehouse.default_queue
======================cost======================
Total cost:[150] ms
Fast statistics cost:[0] ms
Optimizer cost:[73] ms
Build execution plan cost:[10] ms
Init execution plan cost:[9] ms
Start query cost:[13] ms
- Queue cost: [0] ms
- Wait schema cost:[0] ms
- Lock query cost:[0] ms
- Create dataset reader cost:[0] ms
- Create split reader cost:[3] ms
Get result cost:[45] ms
- Get the first block cost:[1] ms
====================resource====================
Memory: total 7 MB. Worker stats: max 3 MB, avg 3 MB, min 3 MB, max memory worker id: 1889346452478863373.
CPU time: total 96 ms. Worker stats: max 82 ms, avg 48 ms, min 14 ms, max CPU time worker id: 1889346412692090879.
DAG CPU time stats: max 77 ms, avg 38 ms, min 0 ms, cnt 2, max CPU time dag id: 1.
Fragment CPU time stats: max 66 ms, avg 25 ms, min 0 ms, cnt 3, max CPU time fragment id: 2.
Ec wait time: total 139 ms. Worker stats: max 79 ms, max(max) 2 ms, avg 69 ms, min 60 ms, max ec wait time worker id: 1889346452478863373, max(max) ec wait time worker id: 1889346412692090879.
Physical read bytes: total 1 MB. Worker stats: max 1 MB, avg 0 MB, min 0 MB, max physical read bytes worker id: 1889346412692090879.
Read bytes: total 7 MB. Worker stats: max 7 MB, avg 3 MB, min 0 MB, max read bytes worker id: 1889346412692090879.
DAG instance count: total 3. Worker stats: max 2, avg 1, min 1, max DAG instance count worker id: 1889346452478863373.
Fragment instance count: total 257. Worker stats: max 129, avg 128, min 128, max fragment instance count worker id: 1889346452478863373.
Meta mirror table count: use 1 miss 0.
Data mirror file count: use 12 miss 0.
Disable mirroring for an external table
Use the following SQL statement:
alter external table <ext_db.ext_schema.ext_table>
set(
data_mirroring_speed_up_enable = 'off'
);
After you disable image acceleration, the system stops synchronizing data to Hologres. Queries on the external table directly access the data in the data lake. The internally mirrored data files are asynchronously deleted after 30 minutes.
View the mirroring configuration
Use the following SQL statement:
SELECT * FROM hologres.hg_datalake_get_mirror_config();
View the mirroring progress
You can use the following SQL to check the progress, where external_database_name is the name of the External Database.
Use the following SQL statement:
SELECT *
FROM hologres.hg_user_datalake_mirror_cron_tasks
WHERE command::jsonb->>'external_db_name'='<ext_db>'
ORDER BY start_time DESC;