Data transfer: Data Integration

更新时间:
复制 MD 格式

You can use Data Integration in DataWorks to synchronize data from various data sources to MaxCompute. Data Integration supports three types of tasks: batch synchronization, real-time synchronization, and synchronization solutions. This topic describes how to transfer data to MaxCompute by using Data Integration.

Batch synchronization

Batch synchronization provides an abstract framework of readers and writers. This framework uses a simplified intermediate data format to transfer structured and semi-structured data from any supported data source to MaxCompute after you define the source and destination.

image

Real-time synchronization

Real-time synchronization in DataWorks keeps your MaxCompute tables updated with the latest changes from a source database. A single real-time synchronization task can use multiple transformation plug-ins for data cleansing and multiple writers for multi-path output. Real-time synchronization supports synchronizing incremental data from a single table to a single MaxCompute table, from sharded tables to a single MaxCompute table, and from an entire database (multiple tables) to multiple MaxCompute tables.

Synchronization solution

In many business scenarios, data synchronization cannot be completed by a single task. It often requires a complex combination of batch synchronization, real-time synchronization, and data processing tasks, making the configuration highly complex.