Synchronize data to MaxCompute,Synchronize data from Hologres to MaxCompute-DataWorks(DataWorks)-阿里云帮助中心

Prerequisites

You have created a MaxCompute project and a Hologres instance.
The MaxCompute project and Hologres instance have been bound to DataWorks as computing resources, and the connectivity test has passed.
You have created a node to synchronize data to MaxCompute.

Limitations

Only data from Hologres internal databases can be synchronized to MaxCompute.
For limits on using Hologres external tables in MaxCompute, see Hologres external tables.
The data type mapping between MaxCompute and Hologres differs, which means that some Hologres data types cannot be synchronized to MaxCompute.

Configure synchronization node

On the configuration tab of the synchronization node, configure the following settings.

Select Hologres source

Select and configure the Hologres source table.

Parameter	Description
Source Object Type	The default value is `Hologres Table`.
Data Sources	Select the Hologres computing resource from which you want to synchronize data.
Instance	The system automatically populates this field with the ID of the Hologres instance.
Database	Select the Hologres database that you want to synchronize.
Schema	Select the schema that you want to synchronize.
Table	Select the name of the table that you want to synchronize.
Filtration conditions	The system automatically generates a filter condition based on the partitioned table. You can modify the condition as needed. Data that meets the filter condition is retained. Note A filter condition is the part of an SQL statement that follows the `WHERE` clause.

Set MaxCompute destination

Configure the MaxCompute destination table.

Parameter		Description
Data Sources		Select the destination MaxCompute computing resource.
Project		The MaxCompute project that corresponds to the data source. The system populates this value automatically.
Schema		Select the schema where you want to store the data. This parameter is available only for MaxCompute projects with the three-layer model enabled.
Table		Specify a name for the MaxCompute internal table.
Lifecycle		Set the table's lifecycle. MaxCompute reclaims the table if its data is not modified within the specified period.
Field	Synchronization Field	Select the fields that you want to synchronize and set the data types for the fields in the destination MaxCompute table.
	Partition Configurations	Specify the partition columns for the internal table in MaxCompute. You can obtain partition data from one of the following sources: Specified Field in Hologres: Select a specific field from the Hologres table as the source of partition column data. Configured Scheduling Variable: Use a scheduling variable from the task to define the partition column.

Configure data synchronization

In the Data Synchronization Settings section, configure the import method and access permissions for the Hologres instance.

Parameter	Description
Import Method	Select one of the following import methods: Overwrite: Deletes existing data and writes new data to the destination table. Append: Retains existing data and adds new data to the destination table.
Access Hologres permissions	Select one of the following methods to access the Hologres instance: Dual-signature access: Uses the current identity to pass Hologres permission checks. Within the MaxCompute project, you must have read permissions on the MaxCompute table and permissions on the corresponding Hologres source table. For information about MaxCompute permissions, see Data Lakehouse 2.0 user guide. For information about Hologres permissions, see Permission management overview. RAM role access: Uses a specified RAM role for identity verification. Grant the AliyunSTSAssumeRoleAccess policy to the RAM role. For more information, see RAM role authorization mode. After authorization is complete, configure your specified RAM role in the RAM Role field.
Location	During synchronization, the system automatically creates a MaxCompute table based on the Hologres external storage path. You can use the default storage path generated by the system or specify a custom path.

Debug synchronization node

Before running the node, configure the debug settings.

Configure the properties for the debug run.

In the Run Configuration pane on the right side of the node configuration tab, configure the Compute Resource and Resource Group. The following table describes the parameters.

Parameter	Description
Compute Resource	Select the bound MaxCompute computing resource.
Computing Quota	Select the computing quota that was generated when you created the MaxCompute project, or click Create Computing Quota at the bottom of the drop-down list to create a new one. For more information, see Computing resources - Quota management.
Resource Group	Select the resource group that passed the connectivity test when you bound the MaxCompute computing resource.
CUs for Scheduling	This node uses the default scheduling CU value. You do not need to modify it.
Parameters	If you use variables in the format ${Parameter name} in the filter condition, you must define their values under Parameter name and Parameter Value in the Script Parameters section. At runtime, the system dynamically replaces these variables with their configured values. For more information, see Node scheduling configuration.

To debug and run the node task, click Save and then click Run.

Next steps

To run the node periodically, configure its scheduling properties in the Scheduling Settings pane on the right. Set the Scheduling Policy and related scheduling properties. See Node scheduling configuration.
To run the task in the production environment, click the icon to start the deployment process. Nodes in a project run on a schedule only after they are deployed to the production environment. See Node deployment.

FAQ

Field type mismatch: If the synchronization task fails due to a data type mismatch, check whether the field data types in the MaxCompute table are configured correctly.
Inconsistent data in a single synchronized partition: Check if the source filter condition is configured correctly.