Synchronize data to MaxCompute

更新时间:
复制 MD 格式

Use a DataWorks synchronization node to migrate data from a single Hologres table to MaxCompute for efficient big data storage. This topic describes how to configure the node to migrate your data and leverage the high-performance processing capabilities of MaxCompute.

Prerequisites

Limitations

Configure synchronization node

On the configuration tab of the synchronization node, follow these steps.

Select Hologres source

Use the following parameter descriptions to select and configure the Hologres source table.

Parameter

Description

Source Object Type

The default value is Hologres Table.

Data Sources

Select the Hologres computing resource from which you want to synchronize data.

Instance

The system automatically populates this field with the ID of the Hologres instance.

Database

Select the Hologres database that you want to synchronize.

Schema

Select the schema that you want to synchronize.

Table

Select the name of the table that you want to synchronize.

Filtration conditions

The system automatically generates a filter condition based on the partitioned table that you use. You can modify the condition based on your business requirements. Data that meets the filter condition is retained.

Note

A filter condition is the part of an SQL statement that follows the WHERE clause.

Set MaxCompute destination

Use the following parameter descriptions to configure the MaxCompute destination table.

Parameter

Description

Data Sources

Select the destination MaxCompute computing resource.

Project

The MaxCompute project that corresponds to the data source. The system populates this value automatically.

Schema

Select the schema where you want to store the data. This parameter is required for MaxCompute projects that have the three-layer model enabled. For projects without this feature, this parameter is not available.

Table

Specify a name for the MaxCompute internal table.

Lifecycle

Set the table's lifecycle. MaxCompute reclaims the table if its data is not modified within the specified period.

Field

Synchronization Field

Select the fields that you want to synchronize and set the data types for the fields in the destination MaxCompute table.

Partition Configurations

Specify the partition columns for the internal table in MaxCompute. You can obtain partition data from one of the following sources:

  • Specified Field in Hologres: Select a specific field from the Hologres table as the source of partition column data.

  • Configured Scheduling Variable: Use a scheduling variable from the task to define the partition column.

Configure data synchronization

In the Data Synchronization Settings section, configure the import method and access permissions for the Hologres instance.

Parameter

Description

Import Method

Select one of the following import methods:

  • Overwrite: Deletes existing data and writes new data to the destination table.

  • Append: Retains existing data and adds new data to the destination table.

Access Hologres permissions

Select one of the following methods to access the Hologres instance:

  • Dual-signature access: Uses the current identity to pass Hologres permission checks.

    Within the MaxCompute project, you must have read permissions on the MaxCompute table and permissions on the corresponding Hologres source table. For information about MaxCompute permissions, see Data Lakehouse 2.0 user guide. For information about Hologres permissions, see Permission management overview.

  • RAM role access: Uses a specified RAM role for identity verification.

    Grant the AliyunSTSAssumeRoleAccess policy to the RAM role. For more information, see RAM role authorization mode. After authorization is complete, configure your specified RAM role in the RAM Role field.

Location

During synchronization, the system automatically creates a MaxCompute table based on the Hologres external storage path. You can use the default storage path generated by the system or specify a custom path.

Debug synchronization node

To debug and run the synchronization node, configure the debug information.

  1. Configure the properties for the debug run.

    In the Run Configuration pane on the right side of the node configuration tab, configure the Compute Resource and Resource Group. The following table describes the parameters.

    Parameter

    Description

    Compute Resource

    Select the bound MaxCompute computing resource.

    Computing Quota

    Select the computing quota that was generated when you created the MaxCompute project, or click Create Computing Quota at the bottom of the drop-down list to create a new one. For more information, see Computing resources - Quota management.

    Resource Group

    Select the resource group that passed the connectivity test when you bound the MaxCompute computing resource.

    CUs for Scheduling

    This node uses the default scheduling CU value. You do not need to modify it.

    Parameters

    If you use variables in the format ${Parameter name} in the filter condition, you must define their values under Parameter name and Parameter Value in the Script Parameters section. At runtime, the system dynamically replaces these variables with their configured values. For more information, see Node scheduling configuration.

  2. To debug and run the node task, click Save and then click Run.

Next steps

  • To run the node periodically, configure its scheduling properties in the Scheduling Settings pane on the right. Set the Scheduling Policy and related scheduling properties. See Node scheduling configuration.

  • To run the task in the production environment, click the image icon to start the deployment process. Nodes in a project run on a schedule only after they are deployed to the production environment. See Node deployment.

FAQ

  • Field type mismatch: If the synchronization task fails due to a data type mismatch, check whether the field data types in the MaxCompute table are configured correctly.

  • Inconsistent data in a single synchronized partition: Check if the source filter condition is configured correctly.