Synchronize Tablestore data to MaxCompute-Tablestore(Tablestore)-阿里云帮助中心

DataWorks data integration synchronizes incremental and full data from Tablestore to MaxCompute.

How it works

DataWorks data integration performs offline (batch) data synchronization. An offline data tunnel defines the source and destination data sources and datasets, and uses data extraction plugins (Readers) and data writing plugins (Writers) with a simplified intermediate data format to transmit data between structured and semi-structured data sources.

To synchronize Tablestore data to MaxCompute, configure Tablestore Reader plugins and MaxCompute Writer plugins in an offline sync task.

Tablestore Reader plugins

The Tablestore Reader plugin varies based on the synchronization method.

Synchronization method	Plugin	Description
Full export	Tablestore (OTS) Reader	Reads data from Tablestore tables. You can specify a timestamp range to extract incremental data. For more information, see Tablestore data source.
Incremental synchronization	OTSStream Reader	Exports incremental data from Tablestore tables. For more information, see Tablestore Stream data source.

MaxCompute Writer plugin

For both full export and incremental synchronization, DataWorks uses the MaxCompute Writer plugin to write data to MaxCompute. For more information, see MaxCompute data source.

Synchronization methods

Offline sync tasks use data filtering and scheduling parameters to determine whether to synchronize full or incremental data.

Synchronization method

Description

References

Full export

Export the full data from Tablestore to MaxCompute in a single operation for backup or use.

With this method, you run the offline sync task only once. Scheduling properties are not required for the task.

Export full data to MaxCompute

Incremental synchronization

Periodically synchronize new and changed data from Tablestore to MaxCompute for backup or use.

For this method, configure the scheduling properties of the offline sync task to periodically synchronize incremental data.

After incremental data is synchronized to MaxCompute, use the merge_udf.jar package in MaxCompute to transform the incremental data from Tablestore into a full data format.

Synchronize incremental data to MaxCompute