DataWorks Data Integration
Data Integration is a reliable, secure, cost-effective, and elastically scalable data synchronization platform from Alibaba Cloud. It supports data transfer across heterogeneous data storage systems and provides offline (full and incremental) data ingestion and egress for over 20 data sources in various network environments.
For more information, see Data Integration and Supported data sources, readers, and writers.
Use cases
- Use a synchronization task in Data Integration to export data from AnalyticDB for PostgreSQL to other data sources for further processing. This process is known as a data export.
- Use a synchronization task in Data Integration to import processed data from other data sources into AnalyticDB for PostgreSQL. This process is known as a data import.
For detailed steps, including how to create a synchronization task, configure data sources and jobs, and set up a whitelist, see the Data Integration section in the DataWorks documentation. This topic describes the procedures for data import and data export with AnalyticDB for PostgreSQL.
Prerequisites
Prepare for the Data Integration task:
- Activate the DataWorks service.
- Activate MaxCompute. A default MaxCompute data source is automatically created. Then, use your Alibaba Cloud account to log on to DataWorks.
- Create a workspace. A workspace is required to collaborate on workflows and manage data and tasks in DataWorks.
Prepare for AnalyticDB for PostgreSQL:
- Before you import data, use a PostgreSQL client to create the destination database and tables in AnalyticDB for PostgreSQL.
- Before you export data, sign in to the AnalyticDB for PostgreSQL console and configure an IP address Whitelist Settings. For more information, see Add a whitelist.
Data import
Add the source data source in the DataWorks console. For detailed steps, see Configure an AnalyticDB for PostgreSQL data source.
Configure a synchronization task:
After you configure the data source, configure a synchronization task to import data to AnalyticDB for PostgreSQL. You can configure a synchronization task in two modes: Wizard Mode and Code Editor.
- Wizard mode. Follow these steps to configure a synchronization task:
- Create a data synchronization node.
- Select a data source.
- Select a data destination. The destination must be AnalyticDB for PostgreSQL.
- Map the source to the destination fields.
- Configure the maximum transmission rate and the rules for handling dirty data.
- Configure scheduling attributes.
Note For detailed steps, see Configure a synchronization task by using the wizard in the DataWorks documentation. - Script mode. Follow these steps to configure a synchronization task:
- Create a data synchronization node.
- Import a template.
- Configure a reader for the synchronization task.
- Configure a writer for the synchronization task. The writer must be AnalyticDB for PostgreSQL.
- Map the source to the destination fields.
- Configure the maximum transmission rate and the rules for handling dirty data.
- Configure scheduling attributes.
Note For detailed steps, see Configure a synchronization task by using the script editor in the DataWorks documentation.
Data export
The procedure for exporting data is similar to that for importing data. For a data export, you configure AnalyticDB for PostgreSQL as the data source (see Configure an AnalyticDB for PostgreSQL data source), and another data source type as the destination.
References
For more information about Data Integration, see the DataWorks documentation.