DataWorks data synchronization nodes allow you to synchronize data from a single MaxCompute table to Hologres for efficient big data analytics and real-time queries. This topic describes the configuration process to help you easily migrate data and leverage Hologres for high-performance queries.
Background information
This feature first imports data from a MaxCompute internal table into a Hologres external table, and then synchronizes the data to a Hologres internal table. The data synchronization from MaxCompute to the Hologres external table is implemented by running the IMPORT FOREIGN SCHEMA command.
Prerequisites
-
You have created a MaxCompute project and a Hologres instance.
-
You have bound the MaxCompute project and Hologres instance to DataWorks as compute resources and passed the Connectivity Test.
Limits
The MaxCompute source table must exist before you can create an external table to read its data.
Create a data synchronization node
Before you configure a data synchronization node, create a data synchronization node for Hologres and go to its configuration page.
Manage the destination data source
On the node configuration page, you can manage the destination data source by following these steps.
-
On the node configuration page, select the bound destination Hologres data source from the drop-down list next to Data Sources.
-
Click Pages for Managing Destination. In the dialog box that appears, select the required operation:
-
HoloWeb (Instance Monitoring): Manage the destination Hologres instance in HoloWeb.
-
Slow Query: View and analyze historical slow queries for the destination Hologres instance.
-
Active Connection Management: Diagnose and manage connections to the destination Hologres instance.
-
Database Authorization: Add a database to the destination Hologres instance or grant database permissions.
-
User management: In the User management module of HoloWeb, add or remove users for the destination Hologres instance and grant them permissions.
-
Configure the data synchronization node
After you select the destination data source, follow these steps on the node configuration page to configure the synchronization task.
Select the MaxCompute source table
Use the following parameters to select the MaxCompute source table.
|
Parameter |
Description |
|
Source Object Type |
The default value is |
|
Project |
Select the MaxCompute project that you want to synchronize. |
|
Schema |
Select the schema that you want to use. |
|
Table Name |
Select the table that you want to synchronize. |
|
Filtration conditions |
The system generates a filter condition for partitioned tables. You can modify this condition as needed. Only data that meets the condition is synchronized. Note
The filter condition is the content that follows the |
Configure the Hologres destination table
Use the following parameters to configure the Hologres destination table.
|
Parameter |
Description |
|
|
Instance |
The system automatically populates this field based on the selected data source. |
|
|
Database |
The system automatically populates this field based on the selected data source. |
|
|
Schema |
Specify the schema for the Hologres internal table. |
|
|
Table Name |
Specify a name for the Hologres internal table. If a table with the specified name already exists, it is handled as follows:
Note
An error occurs if the new and existing tables have different schemas. |
|
|
Field |
Synchronization Field |
Select the fields to synchronize and set the data types for the fields in the Hologres destination table. |
|
Partition Configurations |
Select the partition key fields for the new table. |
|
|
Index Configuration |
Create indexes on the Hologres internal table to accelerate data queries. For more information about how to create an index, see CREATE TABLE.
|
|
Configure advanced settings
When you synchronize data, you can configure GUC parameters and the external server in the Advanced Settings section.
|
Parameter |
Description |
|
GUC Parameters |
Before you import data from MaxCompute, you must set some GUC parameters. For more information about the supported GUC parameters, see GUC parameters. Other SQL statements are not supported. |
|
External Server |
The default value is |
Debug the data synchronization node
To debug a synchronization node task, configure the appropriate debugging information according to your business requirements.
-
Configure the node properties for debugging.
In the Run Configuration pane on the right side of the node configuration page, configure the Compute Resource and Resource Group information. The following table describes the parameters.
Parameter
Description
Compute Resource
Select the Hologres compute resource that you have bound.
Resource Group
Select the Resource Group that passed the Connectivity Test when you bound the Hologres compute resource.
CUs for Scheduling
Set the number of compute CUs required for the task. The default value is
0.25.Parameters
If you define variables by using the
${ParameterName}format in the filter condition, you must specify the Script Parameters and Parameter name in the Parameter Value section. At runtime, the variables are dynamically replaced with their actual values. For more information, see node scheduling configuration. -
To debug and run the node, click Save and then click Run.
Next steps
-
node scheduling configuration: If a node requires periodic scheduling, set the Scheduling Policy in the Scheduling Settings pane on the right and configure scheduling properties.
-
node deployment: If the task needs to be deployed to the production environment, click the
icon to start the deployment process. Nodes are run periodically only after they have been deployed to the production environment.
FAQ
-
Field type mismatch: A field data type mismatch during configuration will cause the synchronization task to fail. Ensure that the field type configuration of the Hologres table is accurate.
-
Data in a single synchronized partition is inconsistent with the actual data: Check whether the filter conditions that you configured at the source are correct.