You can use a Data Synchronization from MaxCompute node in Data Studio to synchronize data from MaxCompute to Hologres. This process allows for faster queries on MaxCompute tables. This topic describes how to create and configure a Data Synchronization from MaxCompute node.
Background
When you use a Data Synchronization from MaxCompute node to query MaxCompute data in Hologres, you must first create an external table in Hologres. This external table links the MaxCompute source table (the table that contains the MaxCompute data to be synchronized) to a Hologres internal table. Data is then synchronized from the source table to the internal table through this link. The external table's schema must match that of the MaxCompute source table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute by using SQL.
Importing MaxCompute data to Hologres provides better performance than directly querying data by using an external table. For more information about creating an external table to query MaxCompute data, see One-click MaxCompute table schema synchronization node.
The operations described in this topic use the China (Shanghai) region as an example. The user interface may vary in other regions.
Prerequisites
Create a table data synchronization node
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Create a Workflow.
If you have an existing Workflow, skip this step.
-
Hover over the
icon and select Create Workflow. -
In the Create Workflow dialog box, enter a Workflow Name.
-
Click Create.
-
Create a Data Synchronization from MaxCompute node.
Hover over the
icon and select .You can also find and right-click the target workflow, and then select .
In the Create Node dialog, enter a Name, and select an Engine Instance, a Node Type, and a Path.
Click OK to go to the node editing page.
Configure the node.
On the node editing page, configure the MaxCompute source table, the destination table for storing data, the synchronization policy, and SQL code information.
The configuration page contains the following sections: Hologres Target Data Source Settings (select a Data Source and a Database), MaxCompute Source Table Selection (select External Table Source as a new external table or an existing external table, and specify the External Server, MaxCompute Project, MaxCompute Schema, MaxCompute Table Name, and Filter Condition), Hologres Target Table Settings (configure the Schema, Table Name, and Table Description), Import GUC Parameter Settings (in the format of
set <GUC>=<values>;, with multiple parameters on separate lines), and Synchronization Settings (which includes the Synchronization Field, Partition Configurations, and Index Configuration tabs).Configure the source table information.
This section is used to configure the MaxCompute table from which data is to be synchronized. You configure a Hologres external table to map the MaxCompute source table data. The following table describes the key parameters.
Parameter
Description
Data Source
The Hologres instance where the Hologres external table resides.
Database
The database in the Hologres instance where the Hologres external table is stored.
MaxCompute source table selection.
Parameter
Description
Source of External Table
The source of the Hologres external table. The external table is used as a link between the MaxCompute source table and the Hologres internal table to synchronize MaxCompute source table data.
Existing External Table: If the external table for synchronizing data to the internal table already exists, select this option. You need to select the schema and name of the existing external table.
Create External Table: The Data Synchronization from MaxCompute node requires a Hologres external table to accelerate queries on MaxCompute table data. If you have not created a Hologres external table, select this option.
This option requires you to configure the server for the new external table, as well as the MaxCompute project name and table name.
NoteYou can directly use the
odps_serverforeign server that is already created in Hologres. For more information about the underlying mechanism, see postgres_fdw.
Configure the Hologres target table settings.
This section is used to create a Hologres internal table for storing MaxCompute data.
Parameter
Description
Schema
The
Schemato which the Hologres internal table belongs.Table Name
The name of the Hologres internal table. When you create an internal table, if a table with the same name already exists, the table is handled as follows based on its type:
Non-partitioned table: The existing internal table and its data are deleted, and Hologres creates a new table.
Partitioned table: The existing table and its data are not deleted. Hologres creates new child partition tables based on partition values and imports data.
NoteIf the schema of the new table is different from that of the existing table, an error is returned.
Table Description
The description of the Hologres internal table.
Configure the synchronization policy.
This section is used to configure the policy for synchronizing MaxCompute table data.
Tab
Description
Synchronization Field
Select the MaxCompute table columns to synchronize.
Partition Configurations
Select the MaxCompute table partitions to synchronize.
NoteHologres currently supports synchronizing only first-level partitions. Multi-level partitions in MaxCompute tables are set as first-level partitions in Hologres, and the remaining partitions are automatically mapped to regular columns in Hologres.
Index Configuration
Build indexes for the Hologres internal table that stores MaxCompute data. You can then use the indexes to quickly query data. For more information about creating indexes, see Set up table indexes.
Generate SQL Script.
DataWorks automatically parses the synchronization configuration and generates SQL statements for running the current synchronization task. You can use the statements to go to the Hologres code editing page and run the synchronization task by using SQL.
NoteThe generated SQL Script cannot be edited. When the synchronization task configuration is updated, refresh the SQL Script to generate new SQL statements.
For more information about running a synchronization task by using SQL, see Import data from MaxCompute by using SQL.
-
Configure scheduling properties for the task.
To run the node periodically, click Scheduling in the right-side pane and configure its scheduling properties:
-
Configure the basic scheduling properties for the task. For more information, see Configure basic properties.
-
Configure the scheduling cycle, rerun properties, and dependencies. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.
NoteYou must set the Rerun attribute and Parent Nodes for the node before you can commit it.
-
Configure resource properties. For more information, see Configure resource properties. To access a MySQL data source over the public internet or a VPC, select a resource group for scheduling that has network connectivity to the MySQL data source. For more information, see Network connectivity solutions.
-
Save and run the node.
On the toolbar of the node editing page, click the
icon to save the node configuration.On the toolbar of the node editing page, click the
icon to synchronize MaxCompute data.
If you use a workspace in standard mode, after the task is submitted, click Deploy Task on the left side of the top toolbar to deploy the task to the production environment. For more information, see Deploy nodes.
-
View scheduled tasks.
-
In the upper-right corner of the editor, click O&M Personnel to open the production environment's Operation Center.
-
View the scheduled tasks that are running. For more information, see Manage scheduled tasks.
To view more details about scheduled tasks, click Operation Center in the top menu bar. For more information, see Operation Center overview.
-
Next step
After the MaxCompute data synchronization is complete, you can go to the DataWorks Table Management page to view data details. For more information, see Manage tables. You can also go to Hologres and use HoloWeb to query MaxCompute data. For more information, see Query data by using HoloWeb.