Use the DataWorks data integration feature to export full data from Tablestore to MaxCompute for offline analysis and processing.
Prerequisites
Before you export data, complete the following tasks:
-
Obtain the instance name, instance endpoint, and region ID of the source Tablestore table.
-
Create a MaxCompute project to use as the destination.
-
Create an AccessKey for your Alibaba Cloud account or a RAM user. Ensure that the account or user has permissions to access Tablestore and MaxCompute.
-
Activate DataWorks and create a workspace in the region that contains your MaxCompute or Tablestore instance.
-
Create a serverless resource group and attach it to the workspace. For more information about billing, see Billing for serverless resource groups.
If your DataWorks workspace and Tablestore instance are in different regions, you must create a VPC peering connection to enable cross-region network connectivity.
Procedure
Follow these steps to configure a full data export from Tablestore to MaxCompute.
Step 1: Add Tablestore data source
Add a Tablestore data source in DataWorks to establish a connection with the source table.
-
Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose . From the drop-down list, select the workspace and click Go to Data Integration.
-
In the navigation pane on the left, click Data source.
-
On the Data Sources page, click Add Data Source.
-
In the Add Data Source dialog box, search for and select Tablestore as the data source type.
-
In the Add OTS Data Source dialog box, configure the data source parameters as described in the following table.
Parameter
Description
Data Source Name
The data source name must be a combination of letters, digits, and underscores (_). It cannot start with a digit or an underscore (_).
Data Source Description
A brief description of the data source. The description cannot exceed 80 characters in length.
Region
Select the region where the Tablestore instance resides.
Tablestore Instance Name
The name of the Tablestore instance.
Endpoint
The endpoint of the Tablestore instance. Use the VPC address.
AccessKey ID
The AccessKey ID and AccessKey secret of the Alibaba Cloud account or RAM user.
AccessKey Secret
-
Test the resource group connectivity.
When you create a data source, you must test the connectivity of the resource group to ensure that the resource group for the sync task can connect to the data source. Otherwise, the data sync task cannot run.
-
In the Connection Configuration section, click Test Network Connectivity in the Connection Status column for the resource group.
-
After the connectivity test passes, click Complete. The new data source appears in the data source list.
If the connectivity test fails, use the Network Connectivity Diagnostic Tool to troubleshoot the issue.
-
Step 2: Add MaxCompute data source
Add and configure a MaxCompute data source as the destination for the data export.
-
Click Add data source again. Select MaxCompute as the data source type and configure the parameters.
Parameter
Description
Data source name
The name must consist of letters, digits, and underscores (_). It cannot start with a digit or an underscore (_).
Data source description
A brief description of the data source, not exceeding 80 characters.
Authentication method
This value is set to Alibaba Cloud account and Alibaba Cloud RAM role by default and cannot be changed.
Alibaba Cloud Account
-
Current Alibaba Cloud Account: Select the MaxCompute project name and Default Access Identity for the current account in the specified region.
-
Another Alibaba Cloud Account: Enter the UID of Alibaba Cloud account, MaxCompute project, and RAM role for the other account in the specified region.
Region
The region where the MaxCompute project is located.
Endpoint
The default value is Auto adapt. You can also select Custom configuration as needed.
-
-
After you configure the parameters and the connectivity test succeeds, click Complete to add the data source.
Step 3: Configure batch synchronization task
Create a data synchronization task to define the transfer rules and field mappings for exporting data from Tablestore to MaxCompute.
Create task node
-
Go to the Data development page.
-
Log on to the DataWorks console.
-
In the top navigation bar, select a resource group and a region.
-
In the left navigation pane, click .
-
Select the target workspace and click Go to Data Studio.
-
-
On the Data development page of the Data Studio console, click the
icon to the right of Workspace Directory, and then choose . -
In the Create node dialog box, select a Path. Set the data source to Tablestore and the destination to MaxCompute (ODPS). Enter a Name and click OK.
Configure synchronization task
Under Workspace Directory, click to open the newly created batch synchronization task node. You can configure the task by using either the codeless UI or the code editor.
Codeless UI (default)
Configure the following items in the codeless UI:
-
Data source: Select the source and destination data sources.
-
Runtime Resource: Select a resource group. The system then automatically tests the data source connectivity.
-
Data Source:
-
Table: Select the source data table from the drop-down list.
-
Primary Key Range (Start): The start of the primary key range from which to read data. The value must be in JSON array format.
inf_minrepresents negative infinity.For example, if the primary key consists of an
intcolumn namedidand astringcolumn namedname, the sample configuration is as follows:Specific primary key range
Full data
[ { "type": "int", "value": "000" }, { "type": "string", "value": "aaa" } ][ { "type": "inf_min" }, { "type": "inf_min" } ] -
Primary Key Range (End): The end of the primary key range from which to read data. The value must be in JSON array format.
inf_maxrepresents positive infinity.For example, if the primary key consists of an
intcolumn namedidand astringcolumn namedname, the sample configuration is as follows:Specific primary key range
Full data
[ { "type": "int", "value": "999" }, { "type": "string", "value": "zzz" } ][ { "type": "inf_max" }, { "type": "inf_max" } ] -
Splitting Configuration: Custom splitting configuration in JSON array format. In most cases, you do not need to configure this parameter (set it to
[]).If data hotspots occur in your Tablestore instance and the automatic splitting policy of Tablestore Reader is ineffective, we recommend that you use custom splitting rules. A split specifies the split points within the primary key range. You only need to configure the shard keys, not all primary keys.
-
-
Destination: Configure the following items. You can keep the default values for other parameters or modify them as needed.
-
Project Name in Production Environment: Displays the name of the MaxCompute project associated with the destination data source.
-
Tunnel Resource Group: By default, Common transmission resources is selected, which is the free quota of MaxCompute. You can select a dedicated Tunnel resource group as needed.
-
Table: Select the destination table. You can click Generate Target Table Schema to automatically generate the destination table.
-
Partition: The synchronized data is saved in a partition for a specified date. This can be used for daily incremental synchronization.
-
Write Mode: Select whether to clear existing data or append new data.
-
-
Destination Field Mapping: Maps the fields from the source to the destination table. The system provides a default mapping based on the source table fields, which you can modify as needed.
After you complete the configuration, click Save at the top of the page.
Code editor
To edit the script, click Code Editor at the top of the page.
The following sample script is for a source data table where the primary key consists of anintcolumn namedidand astringcolumn namedname. The attribute column is anintfield namedage. In your script, replace the values for thedatasourceandtableparameters with your actual values.
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "ots",
"parameter": {
"datasource": "source_data",
"column": [
{
"name": "id",
"type": "INTEGER"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "INTEGER"
}
],
"range": {
"begin": [
{
"type": "inf_min"
},
{
"type": "inf_min"
}
],
"end": [
{
"type": "inf_max"
},
{
"type": "inf_max"
}
],
"split": []
},
"table": "source_table",
"newVersion": "true"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter": {
"partition": "pt=${bizdate}",
"truncate": true,
"datasource": "target_data",
"tunnelQuota": "default",
"column": [
"id",
"name",
"age"
],
"emptyAsNull": false,
"guid": null,
"table": "source_table",
"consistencyCommit": false
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
After you finish editing the script, click Save at the top of the page.
Run synchronization task
-
On the right side of the page, click Debug configuration, select the resource group to use for the run, and add the Script parameters.
-
bizdate: The data partition of the MaxCompute destination table, such as
20251120.
-
-
Click Run at the top of the page to start the synchronization task.
Step 4: View synchronization results
After the task runs, view its execution status in the logs and check the synchronized data in the DataWorks console.
-
View the task running status and result at the bottom of the page. The following log information indicates that the sync task ran successfully.
2025-11-18 11:16:23 INFO Shell run successfully! 2025-11-18 11:16:23 INFO Current task status: FINISH 2025-11-18 11:16:23 INFO Cost time is: 77.208s -
View the results in the destination table.
Go to the DataWorks console. In the left navigation pane, click . Then, click Go to data map to view the destination table and its data.
