Export full data to OSS
This method provides a cost-effective way to back up data or export it as files. You can then download the exported files to your local machine.
Prerequisites
Before you start, complete the following preparations:
Obtain information about the source Tablestore table, including the instance name, instance endpoint, and region ID.
Create an AccessKey for an Alibaba Cloud account or a RAM user that has permissions on Tablestore and OSS.
Activate DataWorks and create a workspace in the same region as your OSS bucket or Tablestore instance.
Create a Serverless resource group and bind it to the workspace. For billing details, see Serverless resource group billing.
If your DataWorks workspace and Tablestore instance are in different regions, you must create a VPC peering connection to enable cross-region network connectivity.
Procedure
Follow these steps to configure and run the data export task.
Step 1: Add a Tablestore data source
Configure a Tablestore data source in DataWorks to connect to your source data.
-
Log on to the DataWorks console. Switch to the target region. In the left-side navigation pane, choose . From the drop-down list, select the workspace and click Go to Data Integration.
-
In the left-side navigation pane, click Data source.
-
On the Data Sources page, click Add Data Source.
-
In the Add Data Source dialog box, search for and select Tablestore as the data source type.
-
In the Add Tablestore Data Source dialog box, configure the following parameters.
Parameter
Description
Data source name
The name must contain only letters, digits, and underscores (_), and cannot start with a digit or an underscore.
Data source description
A brief description of the data source. The description cannot exceed 80 characters in length.
Region
Select the region where the Tablestore instance resides.
Tablestore Instance Name
The name of the Tablestore instance.
Endpoint
The service endpoint of the Tablestore instance. We recommend that you use the VPC Address.
AccessKey ID
The AccessKey ID and AccessKey Secret of your Alibaba Cloud account or RAM user.
AccessKey Secret
-
Test the connectivity of the resource group.
You must test the connectivity of the resource group to ensure it can connect to the data source. Otherwise, the task will fail.
-
In the Connection Configuration section, click Test Network Connectivity in the Connection Status column for the resource group.
-
After the connectivity test passes, the Connection Status changes to Connected. Click Complete. The new data source appears in the data source list.
If the connectivity test result is Failed, you can use the Network Connectivity Diagnostic Tool to troubleshoot the issue.
-
Step 2: Add an OSS data source
Configure an OSS data source as the data export destination.
Click Add Data Source again. In the dialog box, search for and select OSS as the data source type, and then configure the parameters.
Parameter
Description
Data source name
The name can contain only letters, digits, and underscores (_) and cannot start with a digit or an underscore.
Data source description
Enter a brief description of the data source, no longer than 80 characters.
Access Mode
RAM Role Authorization Mode: The DataWorks service account accesses the data source by assuming a RAM role. If this is your first time selecting this mode, follow the on-screen instructions to grant the required permissions.
Access Key Mode: Access the data source by using the AccessKey ID and AccessKey Secret of an Alibaba Cloud account or RAM user.
Select Role
You only need to select a RAM role when the Access Mode is RAM Role Authorization Mode.
AccessKey ID
The AccessKey ID and AccessKey Secret of an Alibaba Cloud account or a RAM user are required only when Access Mode is set to AccessKey Mode.
AccessKey secret
Region
The region where the bucket is located.
Endpoint
For OSS access domain names, see Regions and Endpoints.
Bucket
The name of the bucket.
After you configure the parameters and the connectivity test passes, click Complete.
Step 3: Configure a batch sync task
Create and configure a batch sync task to define the data transfer from Tablestore to OSS.
Create a task node
Go to the Data Development page.
Log in to the DataWorks console.
In the top navigation bar, select the resource group and region.
In the navigation pane on the left, click Data Development and O&M > Data Development.
Select the corresponding workspace and click Go To Data Studio.
On the Data Studio console, on the Data Development page, click the
icon to the right of Workspace Directory and select New Node > Data integration > Batch Synchronization.In the New Node dialog box, select a Path. Set the data source to Tablestore and the data destination to OSS. Enter a Name and click Confirm.
Configure the sync task
Under Workspace Directory, click the new batch sync task node to open it. Configure the sync task by using the codeless UI or the code editor.
Codeless UI
Configure the following items:
Data source: Select the source and destination data sources.
Runtime Resource: Select a resource group. The system then automatically tests the data source connectivity.
Data Source:
Table: Select the source data table from the drop-down list.
Primary Key Range (Start): The start of the primary key range to read, specified as a JSON array. The value
inf_minrepresents negative infinity.If the primary key consists of an
intprimary key column namedidand astringprimary key column namedname, see the following configuration examples:Primary key range
Full data
[ { "type": "int", "value": "000" }, { "type": "string", "value": "aaa" } ][ { "type": "inf_min" }, { "type": "inf_min" } ]Primary Key Range (End): The end of the primary key range to read, specified as a JSON array. The value
inf_maxrepresents positive infinity.If the primary key consists of an
intprimary key column namedidand astringprimary key column namedname, see the following configuration examples:Primary key range
Full data
[ { "type": "int", "value": "999" }, { "type": "string", "value": "zzz" } ][ { "type": "inf_max" }, { "type": "inf_max" } ]Splitting Configuration: This parameter is usually not required. Set it to
[].If hotspots in your Tablestore data render the automatic splitting policy ineffective, use custom splitting rules. These rules define split points within the primary key range. You only need to configure the split key, not every primary key column.
Destination: Select the target Text Type and configure the related parameters.
Text Type: Options include csv, text, orc, and parquet.
Object Name (Path Included): The path and name for the destination file in the OSS bucket. Example:
tablestore/resource_table.csv.Column Delimiter: The default is
,. If the delimiter is a non-printable character, enter its Unicode encoding, such as\u001bor\u007c.Object Path: Required only for the parquet text type. Specifies the file path within the OSS bucket.
File Name: Required only for the parquet text type. Specifies the file name within the OSS bucket.
Destination Field Mapping: Maps fields from the source table to the destination file. Each line, specified in JSON format, represents one field.
Source Field: Includes the primary key fields and attribute columns from the source table.
If the primary key consists of an
intprimary key column namedidand astringprimary key column namedname, and there is anintattribute column namedage, see the following example:{"name":"id","type":"int"} {"name":"name","type":"string"} {"name":"age","type":"int"}Target Field: Includes the primary key fields and attribute columns from the source table.
If the primary key consists of an
intprimary key column namedidand astringprimary key column namedname, and there is anintattribute column namedage, see the following example:{"name":"id","type":"int"} {"name":"name","type":"string"} {"name":"age","type":"int"}
After you complete the configuration, click Save at the top of the page.
Script mode
At the top of the page, click Code Editor to switch to the script editing view.
The following example script is for exporting to a CSV file. The source table's primary key consists of anintcolumn namedidand astringcolumn namedname. The table also includes anintattribute column namedage. When you use the script, replace thedatasource,table, andobjectvalues with your own.
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "ots",
"parameter": {
"datasource": "source_data",
"column": [
{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
],
"range": {
"begin": [
{
"type": "inf_min"
},
{
"type": "inf_min"
}
],
"end": [
{
"type": "inf_max"
},
{
"type": "inf_max"
}
],
"split": []
},
"table": "source_table",
"newVersion": "true"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "oss",
"parameter": {
"dateFormat": "yyyy-MM-dd HH:mm:ss",
"datasource": "target_data",
"writeSingleObject": false,
"column": [
{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
],
"writeMode": "truncate",
"encoding": "UTF-8",
"fieldDelimiter": ",",
"fileFormat": "csv",
"object": "tablestore/source_table.csv"
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}After you finish editing the script, click Save at the top of the page.
Run the sync task
Click Run at the top of the page to start the synchronization task. When you run the task for the first time, you must confirm the Debug Configuration.
Step 4: View the sync results
After the task completes, you can check its execution status in the logs and view the resulting file in your OSS bucket.
-
View the task status and result at the bottom of the page. The following log information indicates that the synchronization task ran successfully:
2025-11-18 11:16:23 INFO Shell run successfully! 2025-11-18 11:16:23 INFO Current task status: FINISH 2025-11-18 11:16:23 INFO Cost time is: 77.208s View the file in the destination bucket.
Go to the Bucket List, click the destination bucket, and then view or download the result file.
