Data synchronization from MaxCompute node

更新时间:
复制 MD 格式

You can use a Data Synchronization from MaxCompute node in Data Studio to synchronize data from MaxCompute to Hologres. This process allows for faster queries on MaxCompute tables. This topic describes how to create and configure a Data Synchronization from MaxCompute node.

Background

When you use a Data Synchronization from MaxCompute node to query MaxCompute data in Hologres, you must first create an external table in Hologres. This external table links the MaxCompute source table (the table that contains the MaxCompute data to be synchronized) to a Hologres internal table. Data is then synchronized from the source table to the internal table through this link. The external table's schema must match that of the MaxCompute source table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute by using SQL.

Importing MaxCompute data to Hologres provides better performance than directly querying data by using an external table. For more information about creating an external table to query MaxCompute data, see One-click MaxCompute table schema synchronization node.

Note

The operations described in this topic use the China (Shanghai) region as an example. The user interface may vary in other regions.

Prerequisites

Create a table data synchronization node

  1. Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Create a Workflow.

    If you have an existing Workflow, skip this step.

    1. Hover over the 新建 icon and select Create Workflow.

    2. In the Create Workflow dialog box, enter a Workflow Name.

    3. Click Create.

  3. Create a Data Synchronization from MaxCompute node.

    1. Hover over the Create icon icon and select Create Node > Hologres > Data Synchronization from MaxCompute.

      You can also find and right-click the target workflow, and then select Create Node > Hologres > Data Synchronization from MaxCompute.

    2. In the Create Node dialog, enter a Name, and select an Engine Instance, a Node Type, and a Path.

    3. Click OK to go to the node editing page.

  4. Configure the node.

    On the node editing page, configure the MaxCompute source table, the destination table for storing data, the synchronization policy, and SQL code information.

    The configuration page contains the following sections: Hologres Target Data Source Settings (select a Data Source and a Database), MaxCompute Source Table Selection (select External Table Source as a new external table or an existing external table, and specify the External Server, MaxCompute Project, MaxCompute Schema, MaxCompute Table Name, and Filter Condition), Hologres Target Table Settings (configure the Schema, Table Name, and Table Description), Import GUC Parameter Settings (in the format of set <GUC>=<values>;, with multiple parameters on separate lines), and Synchronization Settings (which includes the Synchronization Field, Partition Configurations, and Index Configuration tabs).

    1. Configure the source table information.

      This section is used to configure the MaxCompute table from which data is to be synchronized. You configure a Hologres external table to map the MaxCompute source table data. The following table describes the key parameters.

      Parameter

      Description

      Data Source

      The Hologres instance where the Hologres external table resides.

      Database

      The database in the Hologres instance where the Hologres external table is stored.

    2. MaxCompute source table selection.

      Parameter

      Description

      Source of External Table

      The source of the Hologres external table. The external table is used as a link between the MaxCompute source table and the Hologres internal table to synchronize MaxCompute source table data.

      • Existing External Table: If the external table for synchronizing data to the internal table already exists, select this option. You need to select the schema and name of the existing external table.

      • Create External Table: The Data Synchronization from MaxCompute node requires a Hologres external table to accelerate queries on MaxCompute table data. If you have not created a Hologres external table, select this option.

        This option requires you to configure the server for the new external table, as well as the MaxCompute project name and table name.

        Note

        You can directly use the odps_server foreign server that is already created in Hologres. For more information about the underlying mechanism, see postgres_fdw.

    3. Configure the Hologres target table settings.

      This section is used to create a Hologres internal table for storing MaxCompute data.

      Parameter

      Description

      Schema

      The Schema to which the Hologres internal table belongs.

      Table Name

      The name of the Hologres internal table. When you create an internal table, if a table with the same name already exists, the table is handled as follows based on its type:

      • Non-partitioned table: The existing internal table and its data are deleted, and Hologres creates a new table.

      • Partitioned table: The existing table and its data are not deleted. Hologres creates new child partition tables based on partition values and imports data.

        Note

        If the schema of the new table is different from that of the existing table, an error is returned.

      Table Description

      The description of the Hologres internal table.

    4. Configure the synchronization policy.

      This section is used to configure the policy for synchronizing MaxCompute table data.

      Tab

      Description

      Synchronization Field

      Select the MaxCompute table columns to synchronize.

      Partition Configurations

      Select the MaxCompute table partitions to synchronize.

      Note

      Hologres currently supports synchronizing only first-level partitions. Multi-level partitions in MaxCompute tables are set as first-level partitions in Hologres, and the remaining partitions are automatically mapped to regular columns in Hologres.

      Index Configuration

      Build indexes for the Hologres internal table that stores MaxCompute data. You can then use the indexes to quickly query data. For more information about creating indexes, see Set up table indexes.

    5. Generate SQL Script.

      DataWorks automatically parses the synchronization configuration and generates SQL statements for running the current synchronization task. You can use the statements to go to the Hologres code editing page and run the synchronization task by using SQL.

      Note
      • The generated SQL Script cannot be edited. When the synchronization task configuration is updated, refresh the SQL Script to generate new SQL statements.

      • For more information about running a synchronization task by using SQL, see Import data from MaxCompute by using SQL.

  5. Configure scheduling properties for the task.

    To run the node periodically, click Scheduling in the right-side pane and configure its scheduling properties:

  6. Save and run the node.

    1. On the toolbar of the node editing page, click the Save icon icon to save the node configuration.

    2. On the toolbar of the node editing page, click the Run icon icon to synchronize MaxCompute data.

    If you use a workspace in standard mode, after the task is submitted, click Deploy Task on the left side of the top toolbar to deploy the task to the production environment. For more information, see Deploy nodes.

  7. View scheduled tasks.

    1. In the upper-right corner of the editor, click O&M Personnel to open the production environment's Operation Center.

    2. View the scheduled tasks that are running. For more information, see Manage scheduled tasks.

    To view more details about scheduled tasks, click Operation Center in the top menu bar. For more information, see Operation Center overview.

Next step

After the MaxCompute data synchronization is complete, you can go to the DataWorks Table Management page to view data details. For more information, see Manage tables. You can also go to Hologres and use HoloWeb to query MaxCompute data. For more information, see Query data by using HoloWeb.