Use a DataWorks synchronization node to migrate data from a single Hologres table to MaxCompute for efficient big data storage. This topic describes how to configure the node to migrate your data and leverage the high-performance processing capabilities of MaxCompute.
Prerequisites
-
You have created a MaxCompute project and a Hologres instance.
-
The MaxCompute project and Hologres instance have been bound to DataWorks as computing resources, and the connectivity test has passed.
Limitations
-
Only data from Hologres internal databases can be synchronized to MaxCompute.
-
For limits on using Hologres external tables in MaxCompute, see Hologres external tables.
-
The data type mapping between MaxCompute and Hologres differs, which means that some Hologres data types cannot be synchronized to MaxCompute.
Configure synchronization node
On the configuration tab of the synchronization node, follow these steps.
Select Hologres source
Use the following parameter descriptions to select and configure the Hologres source table.
|
Parameter |
Description |
|
Source Object Type |
The default value is |
|
Data Sources |
Select the Hologres computing resource from which you want to synchronize data. |
|
Instance |
The system automatically populates this field with the ID of the Hologres instance. |
|
Database |
Select the Hologres database that you want to synchronize. |
|
Schema |
Select the schema that you want to synchronize. |
|
Table |
Select the name of the table that you want to synchronize. |
|
Filtration conditions |
The system automatically generates a filter condition based on the partitioned table that you use. You can modify the condition based on your business requirements. Data that meets the filter condition is retained. Note
A filter condition is the part of an SQL statement that follows the |
Set MaxCompute destination
Use the following parameter descriptions to configure the MaxCompute destination table.
|
Parameter |
Description |
|
|
Data Sources |
Select the destination MaxCompute computing resource. |
|
|
Project |
The MaxCompute project that corresponds to the data source. The system populates this value automatically. |
|
|
Schema |
Select the schema where you want to store the data. This parameter is required for MaxCompute projects that have the three-layer model enabled. For projects without this feature, this parameter is not available. |
|
|
Table |
Specify a name for the MaxCompute internal table. |
|
|
Lifecycle |
Set the table's lifecycle. MaxCompute reclaims the table if its data is not modified within the specified period. |
|
|
Field |
Synchronization Field |
Select the fields that you want to synchronize and set the data types for the fields in the destination MaxCompute table. |
|
Partition Configurations |
Specify the partition columns for the internal table in MaxCompute. You can obtain partition data from one of the following sources:
|
|
Configure data synchronization
In the Data Synchronization Settings section, configure the import method and access permissions for the Hologres instance.
|
Parameter |
Description |
|
Import Method |
Select one of the following import methods:
|
|
Access Hologres permissions |
Select one of the following methods to access the Hologres instance:
|
|
Location |
During synchronization, the system automatically creates a MaxCompute table based on the Hologres external storage path. You can use the default storage path generated by the system or specify a custom path. |
Debug synchronization node
To debug and run the synchronization node, configure the debug information.
-
Configure the properties for the debug run.
In the Run Configuration pane on the right side of the node configuration tab, configure the Compute Resource and Resource Group. The following table describes the parameters.
Parameter
Description
Compute Resource
Select the bound MaxCompute computing resource.
Computing Quota
Select the computing quota that was generated when you created the MaxCompute project, or click Create Computing Quota at the bottom of the drop-down list to create a new one. For more information, see Computing resources - Quota management.
Resource Group
Select the resource group that passed the connectivity test when you bound the MaxCompute computing resource.
CUs for Scheduling
This node uses the default scheduling CU value. You do not need to modify it.
Parameters
If you use variables in the format ${Parameter name} in the filter condition, you must define their values under Parameter name and Parameter Value in the Script Parameters section. At runtime, the system dynamically replaces these variables with their configured values. For more information, see Node scheduling configuration.
-
To debug and run the node task, click Save and then click Run.
Next steps
-
To run the node periodically, configure its scheduling properties in the Scheduling Settings pane on the right. Set the Scheduling Policy and related scheduling properties. See Node scheduling configuration.
-
To run the task in the production environment, click the
icon to start the deployment process. Nodes in a project run on a schedule only after they are deployed to the production environment. See Node deployment.
FAQ
-
Field type mismatch: If the synchronization task fails due to a data type mismatch, check whether the field data types in the MaxCompute table are configured correctly.
-
Inconsistent data in a single synchronized partition: Check if the source filter condition is configured correctly.