Export data to MaxCompute

更新时间:
复制 MD 格式

Export data from Hologres to MaxCompute by creating a foreign table mapping and running INSERT INTO ... SELECT statements. This approach works for both non-partitioned and partitioned tables.

Prerequisites

Before you begin, ensure that you have:

Limitations

Version requirements:

Hologres version

What to do

V1.1 or later

Run INSERT INTO ... SELECT directly

V0.9 or V0.10

Add SET hg_experimental_enable_write_maxcompute = on; before each INSERT INTO ... SELECT statement

Earlier than V0.9

Upgrade your instance first.

Data type support:

Hologres V1.3 and later support exporting ARRAY and DATE types to MaxCompute. Earlier versions do not support complex types: DATE, ARRAY, MAP, and STRUCT. For data type mappings, see Data type mapping between MaxCompute and Hologres.

Other limitations:

  • Exporting data to transactional tables in MaxCompute is not supported.

  • For TIMESTAMPTZ fields, valid values range from 1677-09-21 00:00:00 to 2262-04-12 00:00:00.

Usage notes

  • Cross-region export: Supported, but same-region deployments deliver better performance due to more stable network connections.

  • Partition support: Hologres only supports single-level partitioning, but you can export data to MaxCompute second-level partitions. This requires correct mapping of the MaxCompute partition key values and the corresponding fields in the Hologres table. You can also export from a partitioned Hologres table to a non-partitioned MaxCompute table.

  • Peak hours: The MaxCompute Tunnel SDK limits concurrent writes and data volume. Avoid running exports during peak hours (such as early mornings) to reduce the chance of hitting quota limits.

  • Field order: Specify fields in the same sequence as they appear in the Hologres table.

Export data from a non-partitioned table

The export process follows three stages:

  1. Create or identify a Hologres internal table containing the data to export.

  2. Create a MaxCompute table to receive the data.

  3. Create a foreign table in Hologres that maps to the MaxCompute table, then run INSERT INTO ... SELECT to copy the data.

Step 1: Create a Hologres internal table

BEGIN;
CREATE TABLE "public"."bank" (
 "age" int8,
 "job" text,
 "marital" text,
 "education" text,
 "card" text,
 "housing" text,
 "loan" text,
 "contact" text,
 "month" text,
 "day_of_week" text,
 "duration" text,
 "campaign" int8,
 "pdays" float8,
 "previous" float8,
 "poutcome" text,
 "emp_var_rate" float8,
 "cons_price_idx" float8,
 "cons_conf_idx" float8,
 "euribor3m" float8,
 "nr_employed" float8,
 "y" int8
);
COMMIT;

Step 2: Create a MaxCompute table

Field order and data types must correspond to those in the Hologres internal table. For MaxCompute table syntax, see Table operations.

CREATE TABLE IF NOT EXISTS mc_bank
(
 age             BIGINT COMMENT 'Age',
 job             STRING COMMENT 'Job type',
 marital         STRING COMMENT 'Marital status',
 education       STRING COMMENT 'Education level',
 card            STRING COMMENT 'Credit card available or not',
 housing         STRING COMMENT 'Mortgage',
 loan            STRING COMMENT 'Loan',
 contact         STRING COMMENT 'Contact information',
 month           STRING COMMENT 'Month',
 day_of_week     STRING COMMENT 'Day of the week',
 duration        STRING COMMENT 'Duration',
 campaign        BIGINT COMMENT 'Number of contacts during the campaign',
 pdays           DOUBLE COMMENT 'Time elapsed since the last contact',
 previous        DOUBLE COMMENT 'Number of contacts with the customer',
 poutcome        STRING COMMENT 'Result of the previous marketing campaign',
 emp_var_rate    DOUBLE COMMENT 'Employment change rate',
 cons_price_idx  DOUBLE COMMENT 'Consumer price index',
 cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
 euribor3m       DOUBLE COMMENT 'Euro deposit rate',
 nr_employed     DOUBLE COMMENT 'Number of employees',
 y               BIGINT COMMENT 'Time deposit available or not'
);

Step 3: Create a foreign table in Hologres

Create a foreign table in Hologres that maps to the MaxCompute table. Alternatively, use the Import Foreign Schema statement to auto-generate the foreign table definition.

BEGIN;
CREATE FOREIGN TABLE "public"."mapping_bank" (
 "age" int8,
 "job" text,
 "marital" text,
 "education" text,
 "card" text,
 "housing" text,
 "loan" text,
 "contact" text,
 "month" text,
 "day_of_week" text,
 "duration" text,
 "campaign" int8,
 "pdays" float8,
 "previous" float8,
 "poutcome" text,
 "emp_var_rate" float8,
 "cons_price_idx" float8,
 "cons_conf_idx" float8,
 "euribor3m" float8,
 "nr_employed" float8,
 "y" int8
)
SERVER odps_server
OPTIONS (project_name '<maxcompute-project-name>', table_name 'mc_bank');
COMMIT;

Parameter

Description

project_name

Name of the destination MaxCompute project

table_name

Name of the destination MaxCompute table

Step 4: Export the data

Export all fields:

-- Optional: use Serverless Computing for large-scale offline imports and ETL jobs.
SET hg_computing_resource = 'serverless';

INSERT INTO mapping_bank
SELECT * FROM bank;

-- Reset to stop using serverless computing resources for subsequent statements.
RESET hg_computing_resource;

Export selected fields:

INSERT INTO mapping_bank
SELECT age, job FROM bank;
Hologres V2.1.17 and later support Overview of serverless computing, which allocates additional serverless computing resources for large-scale batch data imports, extract, transform, and load (ETL) jobs, and large-volume foreign table queries. This improves instance stability and reduces out of memory (OOM) errors. You are charged only for the serverless computing resources consumed by your tasks. For usage details, see Enable serverless computing.

Export data from a partitioned table

The export process follows the same three stages as for non-partitioned tables. The main difference is that the MaxCompute table can have single- or second-level partitions, and you must map the partition key values accordingly.

Step 1: Create a Hologres partitioned table

BEGIN;
CREATE TABLE "public"."par_bank" (
 "age" int8,
 "job" text,
 "marital" text,
 "education" text,
 "default" text,
 "housing" text,
 "loan" text,
 "contact" text,
 "month" text,
 "day_of_week" text,
 "duration" text,
 "campaign" int8,
 "pdays" float8,
 "previous" float8,
 "poutcome" text,
 "emp_var_rate" float8,
 "cons_price_idx" float8,
 "cons_conf_idx" float8,
 "euribor3m" float8,
 "nr_employed" float8,
 "y" int8,
 "ds" text
)
PARTITION BY list (ds);
COMMIT;

-- Create child tables.
CREATE TABLE "public"."par_bank_20190830" PARTITION OF "public"."par_bank" FOR VALUES IN ('20190830');
CREATE TABLE "public"."par_bank_20190901" PARTITION OF "public"."par_bank" FOR VALUES IN ('20190901');

Step 2: Create a MaxCompute table

The MaxCompute table can have single- or second-level partitions. Fields and data types must correspond to those in the Hologres internal table. For MaxCompute table syntax, see Table operations.

Single-level partitioning:

CREATE TABLE IF NOT EXISTS mc_par_bank
(
    age            BIGINT COMMENT 'Age',
    job            STRING COMMENT 'Job type',
    marital        STRING COMMENT 'Marital status',
    education      STRING COMMENT 'Education level',
    default        STRING COMMENT 'Credit card available or not',
    housing        STRING COMMENT 'Mortgage',
    loan           STRING COMMENT 'Loan',
    contact        STRING COMMENT 'Contact information',
    month          STRING COMMENT 'Month',
    day_of_week    STRING COMMENT 'Day of the week',
    duration       STRING COMMENT 'Duration',
    campaign       BIGINT COMMENT 'Number of contacts during the campaign',
    pdays          DOUBLE COMMENT 'Time elapsed since the last contact',
    previous       DOUBLE COMMENT 'Number of contacts with the customer',
    poutcome       STRING COMMENT 'Result of the previous marketing campaign',
    emp_var_rate   DOUBLE COMMENT 'Employment change rate',
    cons_price_idx DOUBLE COMMENT 'Consumer price index',
    cons_conf_idx  DOUBLE COMMENT 'Consumer confidence index',
    euribor3m      DOUBLE COMMENT 'Euro deposit rate',
    nr_employed    DOUBLE COMMENT 'Number of employees',
    y              BIGINT COMMENT 'Time deposit available or not'
)
PARTITIONED BY
(
    ds             STRING
);

ALTER TABLE mc_par_bank ADD IF NOT EXISTS PARTITION (ds='20190830');
ALTER TABLE mc_par_bank ADD IF NOT EXISTS PARTITION (ds='20190901');

Second-level partitioning:

CREATE TABLE IF NOT EXISTS mc_par_bank_2
(
    age            BIGINT COMMENT 'Age',
    job            STRING COMMENT 'Job type',
    marital        STRING COMMENT 'Marital status',
    education      STRING COMMENT 'Education level',
    default        STRING COMMENT 'Credit card available or not',
    housing        STRING COMMENT 'Mortgage',
    loan           STRING COMMENT 'Loan',
    contact        STRING COMMENT 'Contact information',
    month          STRING COMMENT 'Month',
    day_of_week    STRING COMMENT 'Day of the week',
    duration       STRING COMMENT 'Duration',
    campaign       BIGINT COMMENT 'Number of contacts during the campaign',
    pdays          DOUBLE COMMENT 'Time elapsed since the last contact',
    previous       DOUBLE COMMENT 'Number of contacts with the customer',
    poutcome       STRING COMMENT 'Result of the previous marketing campaign',
    emp_var_rate   DOUBLE COMMENT 'Employment change rate',
    cons_price_idx DOUBLE COMMENT 'Consumer price index',
    cons_conf_idx  DOUBLE COMMENT 'Consumer confidence index',
    euribor3m      DOUBLE COMMENT 'Euro deposit rate',
    nr_employed    DOUBLE COMMENT 'Number of employees'
)
PARTITIONED BY
(
    y              BIGINT,
    ds             STRING
);

ALTER TABLE mc_par_bank_2 ADD IF NOT EXISTS PARTITION (y='1', ds='20190830');
ALTER TABLE mc_par_bank_2 ADD IF NOT EXISTS PARTITION (y='1', ds='20190901');

Step 3: Create a foreign tables in Hologres

For MaxCompute single-level partitioning:

BEGIN;
CREATE FOREIGN TABLE "public"."mapping_par_bank" (
 "age" int8,
 "job" text,
 "marital" text,
 "education" text,
 "default" text,
 "housing" text,
 "loan" text,
 "contact" text,
 "month" text,
 "day_of_week" text,
 "duration" text,
 "campaign" int8,
 "pdays" float8,
 "previous" float8,
 "poutcome" text,
 "emp_var_rate" float8,
 "cons_price_idx" float8,
 "cons_conf_idx" float8,
 "euribor3m" float8,
 "nr_employed" float8,
 "y" int8,
 "ds" text
)
SERVER odps_server
OPTIONS (project_name '<maxcompute-project-name>', table_name 'mc_par_bank');
COMMIT;

For MaxCompute second-level partitioning:

BEGIN;
CREATE FOREIGN TABLE "public"."mapping_par_bank_2" (
 "age" int8,
 "job" text,
 "marital" text,
 "education" text,
 "default" text,
 "housing" text,
 "loan" text,
 "contact" text,
 "month" text,
 "day_of_week" text,
 "duration" text,
 "campaign" int8,
 "pdays" float8,
 "previous" float8,
 "poutcome" text,
 "emp_var_rate" float8,
 "cons_price_idx" float8,
 "cons_conf_idx" float8,
 "euribor3m" float8,
 "nr_employed" float8,
 "y" int8,
 "ds" text
)
SERVER odps_server
OPTIONS (project_name '<maxcompute-project-name>', table_name 'mc_par_bank_2');
COMMIT;

Parameter

Description

project_name

Name of the destination MaxCompute project

table_name

Name of the destination MaxCompute table

Step 4: Export the data

Export to MaxCompute single-level partitions:

-- Optional: Use Serverless Computing for bulk load and ETL tasks.
SET hg_computing_resource = 'serverless';

-- Option 1: Insert into the parent table with a WHERE clause to filter by partition.
INSERT INTO mapping_par_bank SELECT * FROM "public"."par_bank" WHERE ds='20190830';

-- Option 2: Insert into the child table directly.
INSERT INTO mapping_par_bank SELECT * FROM "public"."par_bank_20190901";

-- Reset to stop using serverless computing resources for subsequent statements.
RESET hg_computing_resource;

Export to MaxCompute second-level partitions:

-- Optional: use Serverless Computing for bulk load and ETL tasks.
SET hg_computing_resource = 'serverless';

-- Option 1: Query the parent table with a WHERE clause to filter by both partition levels.
INSERT INTO mapping_par_bank_2 SELECT * FROM "public"."par_bank" WHERE y='1' AND ds='20190830';

-- Option 2: Query the child partition table and filter by the second-level partition.
INSERT INTO mapping_par_bank_2 SELECT * FROM "public"."par_bank_20190901" WHERE y='1';

-- Reset to stop using serverless computing resources for subsequent statements.
RESET hg_computing_resource;

FAQ

Why do I get a FlowExceeded error when exporting to MaxCompute?

Problem description: During peak hours of the MaxCompute Tunnel SDK service (such as early mornings), if your write task exceeds the concurrent write or data volume limits.

Cause: Hologres uses the MaxCompute Tunnel SDK to perform high-performance data exports. The Tunnel SDK enforces limits on concurrent write requests and data size to protect shared infrastructure. See Tunnel command and Limits for details.

Resolution:

  1. Retry the export.

  2. If the error persists, throttle the write concurrency:

-- Set to a value between 0 and your table's shard count.
SET hg_experimental_write_maxcompute_dop = <count>;