Export full data to OSS

更新时间: 2026-05-12 02:55:25

This method provides a cost-effective way to back up data or export it as files. You can then download the exported files to your local machine.

Prerequisites

Before you start, complete the following preparations:

Note

If your DataWorks workspace and Tablestore instance are in different regions, you must create a VPC peering connection to enable cross-region network connectivity.

Creating a VPC peering connection for cross-region network connectivity

The following example shows how to configure a connection when the source table's instance is in the China (Shanghai) region and the DataWorks workspace is in the China (Hangzhou) region.

  1. Bind a VPC to the Tablestore instance.

    1. Log on to the Tablestore console. In the top navigation bar, select the region where the target table is located.

    2. Click the instance alias to open the Instance Management page.

    3. On the Network Management tab, click Bind VPC, select a VPC and vSwitch, enter a VPC name, and then click OK.

    4. The page automatically refreshes after the VPC is bound. In the VPC list, you can view the VPC ID and VPC Address.

      Note

      When you add a Tablestore data source in the DataWorks console, you must use this VPC address.

      image

  2. Obtain the VPC information of the DataWorks workspace resource group.

    1. Log on to the DataWorks console. In the top navigation bar, select the region where your workspace is located. In the left-side navigation pane, click Workspace to go to the Workspace list page.

    2. Click the workspace name to go to the Workspace Details page. In the left-side navigation pane, click Resource Group to view the list of resource groups that are bound to the workspace.

    3. To the right of the target resource group, click Network Settings. In the Data Scheduling & Data Integration section, view the VPC ID of the bound VPC.

  3. Create a VPC peering connection and configure routes.

    1. Log on to the VPC console. In the left-side navigation pane, click VPC Peering Connection, and then click Create VPC Peering Connection.

    2. On the Create VPC Peering Connection page, enter a name for the peering connection. Then, select the requester VPC instance, accepter account type, accepter region, and accepter VPC instance, and click OK.

    3. On the VPC Peering Connection page, find the VPC peering connection that you created. Click Configure route in both the Requester VPC and Accepter columns.

      The destination CIDR block must be the CIDR block of the peer VPC. For example, when you configure a route for the requester, enter the CIDR block of the accepter. When you configure a route for the accepter, enter the CIDR block of the requester.

Procedure

Follow these steps to configure and run the data export task.

Step 1: Add a Tablestore data source

Configure a Tablestore data source in DataWorks to connect to your source data.

  1. Log on to the DataWorks console. Switch to the target region. In the left-side navigation pane, choose Data integration > Data integration. From the drop-down list, select the workspace and click Go to Data Integration.

  2. In the left-side navigation pane, click Data source.

  3. On the Data Sources page, click Add Data Source.

  4. In the Add Data Source dialog box, search for and select Tablestore as the data source type.

  5. In the Add Tablestore Data Source dialog box, configure the following parameters.

    Parameter

    Description

    Data source name

    The name must contain only letters, digits, and underscores (_), and cannot start with a digit or an underscore.

    Data source description

    A brief description of the data source. The description cannot exceed 80 characters in length.

    Region

    Select the region where the Tablestore instance resides.

    Tablestore Instance Name

    The name of the Tablestore instance.

    Endpoint

    The service endpoint of the Tablestore instance. We recommend that you use the VPC Address.

    AccessKey ID

    The AccessKey ID and AccessKey Secret of your Alibaba Cloud account or RAM user.

    AccessKey Secret

  6. Test the connectivity of the resource group.

    You must test the connectivity of the resource group to ensure it can connect to the data source. Otherwise, the task will fail.

    1. In the Connection Configuration section, click Test Network Connectivity in the Connection Status column for the resource group.

    2. After the connectivity test passes, the Connection Status changes to Connected. Click Complete. The new data source appears in the data source list.

      If the connectivity test result is Failed, you can use the Network Connectivity Diagnostic Tool to troubleshoot the issue.

Step 2: Add an OSS data source

Configure an OSS data source as the data export destination.

  1. Click Add Data Source again. In the dialog box, search for and select OSS as the data source type, and then configure the parameters.

    Parameter

    Description

    Data source name

    The name can contain only letters, digits, and underscores (_) and cannot start with a digit or an underscore.

    Data source description

    Enter a brief description of the data source, no longer than 80 characters.

    Access Mode

    • RAM Role Authorization Mode: The DataWorks service account accesses the data source by assuming a RAM role. If this is your first time selecting this mode, follow the on-screen instructions to grant the required permissions.

    • Access Key Mode: Access the data source by using the AccessKey ID and AccessKey Secret of an Alibaba Cloud account or RAM user.

    Select Role

    You only need to select a RAM role when the Access Mode is RAM Role Authorization Mode.

    AccessKey ID

    The AccessKey ID and AccessKey Secret of an Alibaba Cloud account or a RAM user are required only when Access Mode is set to AccessKey Mode.

    AccessKey secret

    Region

    The region where the bucket is located.

    Endpoint

    For OSS access domain names, see Regions and Endpoints.

    Bucket

    The name of the bucket.

  2. After you configure the parameters and the connectivity test passes, click Complete.

Step 3: Configure a batch sync task

Create and configure a batch sync task to define the data transfer from Tablestore to OSS.

Create a task node

  1. Go to the Data Development page.

    1. Log in to the DataWorks console.

    2. In the top navigation bar, select the resource group and region.

    3. In the navigation pane on the left, click Data Development and O&M > Data Development.

    4. Select the corresponding workspace and click Go To Data Studio.

  2. On the Data Studio console, on the Data Development page, click the image icon to the right of Workspace Directory and select New Node > Data integration > Batch Synchronization.

  3. In the New Node dialog box, select a Path. Set the data source to Tablestore and the data destination to OSS. Enter a Name and click Confirm.

Configure the sync task

Under Workspace Directory, click the new batch sync task node to open it. Configure the sync task by using the codeless UI or the code editor.

Codeless UI

Configure the following items:

  • Data source: Select the source and destination data sources.

  • Runtime Resource: Select a resource group. The system then automatically tests the data source connectivity.

  • Data Source:

    • Table: Select the source data table from the drop-down list.

    • Primary Key Range (Start): The start of the primary key range to read, specified as a JSON array. The value inf_min represents negative infinity.

      If the primary key consists of an int primary key column named id and a string primary key column named name, see the following configuration examples:

      Primary key range

      Full data

      [
        {
          "type": "int",
          "value": "000"
        },
        {
          "type": "string",
          "value": "aaa"
        }
      ]
      [
        {
          "type": "inf_min"
        },
        {
          "type": "inf_min"
        }
      ]
    • Primary Key Range (End): The end of the primary key range to read, specified as a JSON array. The value inf_max represents positive infinity.

      If the primary key consists of an int primary key column named id and a string primary key column named name, see the following configuration examples:

      Primary key range

      Full data

      [
        {
          "type": "int",
          "value": "999"
        },
        {
          "type": "string",
          "value": "zzz"
        }
      ]
      [
        {
          "type": "inf_max"
        },
        {
          "type": "inf_max"
        }
      ]
    • Splitting Configuration: This parameter is usually not required. Set it to [].

      If hotspots in your Tablestore data render the automatic splitting policy ineffective, use custom splitting rules. These rules define split points within the primary key range. You only need to configure the split key, not every primary key column.

  • Destination: Select the target Text Type and configure the related parameters.

    • Text Type: Options include csv, text, orc, and parquet.

    • Object Name (Path Included): The path and name for the destination file in the OSS bucket. Example: tablestore/resource_table.csv.

    • Column Delimiter: The default is ,. If the delimiter is a non-printable character, enter its Unicode encoding, such as \u001b or \u007c.

    • Object Path: Required only for the parquet text type. Specifies the file path within the OSS bucket.

    • File Name: Required only for the parquet text type. Specifies the file name within the OSS bucket.

  • Destination Field Mapping: Maps fields from the source table to the destination file. Each line, specified in JSON format, represents one field.

    • Source Field: Includes the primary key fields and attribute columns from the source table.

      If the primary key consists of an int primary key column named id and a string primary key column named name, and there is an int attribute column named age, see the following example:

      {"name":"id","type":"int"}
      {"name":"name","type":"string"}
      {"name":"age","type":"int"}
    • Target Field: Includes the primary key fields and attribute columns from the source table.

      If the primary key consists of an int primary key column named id and a string primary key column named name, and there is an int attribute column named age, see the following example:

      {"name":"id","type":"int"}
      {"name":"name","type":"string"}
      {"name":"age","type":"int"}

After you complete the configuration, click Save at the top of the page.

Script mode

At the top of the page, click Code Editor to switch to the script editing view.

The following example script is for exporting to a CSV file. The source table's primary key consists of an int column named id and a string column named name. The table also includes an int attribute column named age. When you use the script, replace the datasource, table, and object values with your own.
{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "source_data",
                "column": [
                    {
                        "name": "id",
                        "type": "int"
                    },
                    {
                        "name": "name",
                        "type": "string"
                    },
                    {
                        "name": "age",
                        "type": "int"
                    }
                ],
                "range": {
                    "begin": [
                        {
                            "type": "inf_min"
                        },
                        {
                            "type": "inf_min"
                        }
                    ],
                    "end": [
                        {
                            "type": "inf_max"
                        },
                        {
                            "type": "inf_max"
                        }
                    ],
                    "split": []
                },
                "table": "source_table",
                "newVersion": "true"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "oss",
            "parameter": {
                "dateFormat": "yyyy-MM-dd HH:mm:ss",
                "datasource": "target_data",
                "writeSingleObject": false,
                "column": [
                    {
                        "name": "id",
                        "type": "int"
                    },
                    {
                        "name": "name",
                        "type": "string"
                    },
                    {
                        "name": "age",
                        "type": "int"
                    }
                ],
                "writeMode": "truncate",
                "encoding": "UTF-8",
                "fieldDelimiter": ",",
                "fileFormat": "csv",
                "object": "tablestore/source_table.csv"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

After you finish editing the script, click Save at the top of the page.

Run the sync task

Click Run at the top of the page to start the synchronization task. When you run the task for the first time, you must confirm the Debug Configuration.

Step 4: View the sync results

After the task completes, you can check its execution status in the logs and view the resulting file in your OSS bucket.

  1. View the task status and result at the bottom of the page. The following log information indicates that the synchronization task ran successfully:

    2025-11-18 11:16:23 INFO Shell run successfully!
    2025-11-18 11:16:23 INFO Current task status: FINISH
    2025-11-18 11:16:23 INFO Cost time is: 77.208s
  2. View the file in the destination bucket.

    Go to the Bucket List, click the destination bucket, and then view or download the result file.

FAQ

Related topics

上一篇: Synchronize data from Tablestore to OSS 下一篇: Synchronize incremental data to OSS
阿里云首页 表格存储 相关技术圈